Read file with strings and integers in matlab - file

I have to read a file in Matlab that looks like this:
D:\Classified\positive-videos\vid.avi 163 3 14 32 54 79 105 130 155 202 216 224 238 250 262 288 288 322 357 369 381 438 457 478 499 525 551
D:\Classified\positive-videos\vid2.avi 163 3 14 32 54 79 105 130 155 202 216 224 238 250 262 288 288 322 357 369 381 438 457 478 499 525 551
There are many such lines separated by newline. I need to read it such that: I discard path of video name and first integer(eg 163 in first line) and read rest all the numbers in an array till new line occurs. How can this be done?

You could do the following:
fid = fopen('test1.txt','r');
my_line = fgetl(fid);
while(my_line ~= -1)
my_array = regexp(my_line,' ','split');
my_line = fgetl(fid);
disp(my_array(3:end));
end
fclose(fid);
This would give you:
ans =
Columns 1 through 11
'3' '14' '32' '54' '79' '105' '130' '155' '202' '216' '224'
Columns 12 through 22
'238' '250' '262' '288' '288' '322' '357' '369' '381' '438' '457'
Columns 23 through 26
'478' '499' '525' '551'
ans =
Columns 1 through 11
'3' '14' '32' '54' '79' '105' '130' '155' '202' '216' '224'
Columns 12 through 22
'238' '250' '262' '288' '288' '322' '357' '369' '381' '438' '457'
Columns 23 through 26
'478' '499' '525' '551'
EDIT
For a numeric matrix result you can change it as:
clear;
close;
clc;
fid = fopen('test1.txt','r');
my_line = fgetl(fid);
my_array = regexp(my_line,' ','split');
my_matrix = zeros(0, numel(my_array(3:end)));
ii = 1;
while(my_line ~= -1)
my_array = regexp(my_line,' ','split');
my_line = fgetl(fid);
my_matrix = [my_matrix;zeros(1,size(my_matrix,2))];
for jj=1:numel(my_array(3:end))
my_matrix(ii,jj) = str2num(cell2mat(my_array(jj+2)));
end
ii = ii + 1;
end
fclose(fid);
This would yeild:
my_matrix =
3 14 32 54 79 105 130 155 202 216 224 238 250 262 288 288 322 357 369 381 438 457 478 499 525 551
3 14 32 54 79 105 130 155 202 216 224 238 250 262 288 288 322 357 369 381 438 457 478 499 525 551

A way easier method follows up:
fid = importdata(filename)
results = fid.data;
Ad maiora.
EDIT
Since you wanna discard the first value after the string, you will have to call
res = fid.data(:,2:end);
instead of results.

Related

Google Data Studio: Compare daily sales to 7-day average

I have a data source with daily sales per product.
I want to create a field that calculates the average daily sales for the 7 last days, for each product and day (e.g. on day 10 for product A, it will give me the average sales for product A on days 3 - 9; on Day 15 for product B, I'll see the average sales of B on days 8 - 14).
Is this possible?
Example data (I have the first 3 columns. need to generate the fourth)
Date Product Sales 7-Day Average
1/11 A 983 201
2/11 A 650 983
3/11 A 328 817
4/11 A 728 654
5/11 A 246 672
6/11 A 613 587
7/11 A 575 591
8/11 A 601 589
9/11 A 462 534
10/11 A 979 508
11/11 A 148 601
12/11 A 238 518
13/11 A 53 517
14/11 A 500 437
15/11 A 684 426
16/11 A 261 438
17/11 A 69 409
18/11 A 159 279
19/11 A 964 281
20/11 A 429 384
21/11 A 731 438
1/11 B 790 471
2/11 B 265 486
3/11 B 94 487
4/11 B 66 490
5/11 B 124 477
6/11 B 555 357
7/11 B 190 375
8/11 B 232 298
9/11 B 747 218
10/11 B 557 287
11/11 B 432 353
12/11 B 526 405
13/11 B 690 463
14/11 B 350 482
15/11 B 512 505
16/11 B 273 545
17/11 B 679 477
18/11 B 164 495
19/11 B 799 456
20/11 B 749 495
21/11 B 391 504
Haven't really tried anything. Couldn't figure out how to do get started with this)
This may not be the super perfect solution but it does give your expected result in a crude way.
Cross-join the same data source first as shown in the screenshot
Use the calculated field to get the last 7 day average
(CASE WHEN Date (Table 2) BETWEEN DATETIME_SUB(Date (Table 1), INTERVAL 7 DAY) AND DATETIME_SUB(Date (Table 1), INTERVAL 1 DAY) THEN Sales (Table 2) ELSE 0 END)/7
-

What is the behavior of iscntrl?

The function iscntrl is standardized. Unfortuneately on C99 we have:
The iscntrl function tests for any control character
Considering the prototype which is int iscntrl(int c); I am expecting something like true for 0..31 and perhaps 127 too. However in the following:
#include <stdio.h>
#include <ctype.h>
int main()
{
int i;
printf("The ASCII value of all control characters are ");
for (i=0; i<=1024; ++i)
{
if (iscntrl(i)!=0)
printf("%d ", i);
}
return 0;
}
I get this output:
The ASCII value of all control characters are 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 264 288 308 310 320 334
336 346 348 372 374 390 398 404 406 412 420 428 436 444 452 458 460 466 468 474
476 484 492 500 506 512 518 530 536 542 638 644 656 662 668 682 688 694 700 706
708 714 716 718 760 774 780 782 788 798 826 834 836 846 854 856 864 866 874 876
882 888 890 892 898 900 908 962 968 970 988 994 1000
So I am wondering how this function is implemented behind the scene. I tried to search on the standard library, but the answer is not obvious.
https://github.com/bminor/glibc/search?q=iscntrl&unscoped_q=iscntrl
Any ideas?
You are invoking undefined behavior by passing improper values to iscntrl().
Per 7.4 Character handling <ctype.h>, paragraph 1:
The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

Modifying function that malloc's a 2d array to a 3d array in C

I'm very new to C, this is the first program I'm writing in it. My professor gave us a function for allocating memory for a 2d array, called malloc2d. I am supposed to modify it to allocate memory for a 3d array, but being so new to C I am not sure how to go about it. I've tried looking at other malloc functions for 3d arrays but none of them look similar to the one I was given. Similarly, we have a free2d function that also needs to be modified for a 3d array. Here are the functions to be modified:
void** malloc2D(size_t rows, size_t cols, size_t sizeOfType){
void* block = malloc(sizeOfType * rows * cols);
void** matrix = malloc(sizeof(void*) * rows);
for (int row = 0; row < rows; ++row) {
matrix[row] = block + cols * row * sizeOfType;
}//for
return matrix;
}//malloc2D
void free2D(void*** matrix){
free((*matrix)[0]);
free((*matrix));
matrix = NULL;
}//free2D
Any help or a start would be greatly appreciated.
I find it difficult to believe this is a first exercise; it is moderately tricky, at least.
Fix the 2D code
The first step should be to clean up the malloc2D() function so it doesn't casually use a GCC extension — indexing a void * — because Standard C does not allow that (because sizeof(void) is undefined in Standard C; GNU C defines it as 1). Also, the bug in free2D() needs to be fixed; the last line of the function should read *matrix = NULL; (the * was omitted). That code should be tested too, because the correct way to access the matrix is not obvious.
Here's some modified code (variables renamed for consistency with the 3D version) that tests the revised 2D code:
/* SO 4885-6272 */
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
/* Should be declared in a header for use in other files */
extern void **malloc2D(size_t rows, size_t cols, size_t sizeOfType);
extern void free2D(void ***matrix);
void **malloc2D(size_t rows, size_t cols, size_t sizeOfType)
{
void *level2 = malloc(sizeOfType * rows * cols);
void **level1 = malloc(sizeof(void *) * rows);
if (level2 == NULL || level1 == NULL)
{
free(level2);
free(level1);
return NULL;
}
for (size_t row = 0; row < rows; ++row)
{
level1[row] = (char *)level2 + cols * row * sizeOfType;
}
return level1;
}
void free2D(void ***matrix)
{
free((*matrix)[0]);
free((*matrix));
*matrix = NULL;
}
static void test2D(size_t m2_rows, size_t m2_cols)
{
printf("rows = %zu; cols = %zu\n", m2_rows, m2_cols);
void **m2 = malloc2D(m2_rows, m2_cols, sizeof(double));
if (m2 == NULL)
{
fprintf(stderr, "Memory allocation failed for 2D array of size %zux%zu doubles\n",
m2_rows, m2_cols);
return;
}
printf("m2 = 0x%.12" PRIXPTR "; m2[0] = 0x%.12" PRIXPTR "\n",
(uintptr_t)m2, (uintptr_t)m2[0]);
for (size_t i = 0; i < m2_rows; i++)
{
for (size_t j = 0; j < m2_cols; j++)
((double *)m2[i])[j] = (i + 1) * 10 + (j + 1);
}
for (size_t i = 0; i < m2_rows; i++)
{
for (size_t j = 0; j < m2_cols; j++)
printf("%4.0f", ((double *)m2[i])[j]);
putchar('\n');
}
free2D(&m2);
printf("m2 = 0x%.16" PRIXPTR "\n", (uintptr_t)m2);
}
int main(void)
{
test2D(4, 5);
test2D(10, 3);
test2D(3, 10);
//test2D(300000000, 1000000000); /* 2132 PiB - should fail to allocate on sane systems! */
return 0;
}
When run on a MacBook Pro running macOS High Sierra 10.13.3, compiling with GCC 7.3.0, I get the output:
rows = 4; cols = 5
m2 = 0x7F83C04027F0; m2[0] = 0x7F83C0402750
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
41 42 43 44 45
m2 = 0x0000000000000000
rows = 10; cols = 3
m2 = 0x7F83C0402750; m2[0] = 0x7F83C04028C0
11 12 13
21 22 23
31 32 33
41 42 43
51 52 53
61 62 63
71 72 73
81 82 83
91 92 93
101 102 103
m2 = 0x0000000000000000
rows = 3; cols = 10
m2 = 0x7F83C04027A0; m2[0] = 0x7F83C04028C0
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
m2 = 0x0000000000000000
With the monster allocation included, the trace ended:
alloc3d19(8985,0x7fffa5d79340) malloc: *** mach_vm_map(size=2400000000000000000) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Memory allocation failed for 2D array of size 300000000x1000000000 doubles
Adapt to 3D code
I chose to call the leading dimension of the 3D array a 'plane'; each plane contains a 2D array with r rows by c columns.
For me, I drew myself a diagram to convince myself I was getting the assignments correct — after I'd messed up a couple of times. In each cell in the first two tables, the first number is the index number of the cell in the containing array (level1 in the first table) and the second is the index number of the cell in the next level (level2 in the first table). The numbers in the level3 table are simply the indexes into the array of doublea.
level1 (planes: 4)
╔═══════╗
║ 0: 00 ║
║ 1: 05 ║
║ 2: 10 ║
║ 3: 15 ║
╚═══════╝
level2 (planes: 4; rows: 5)
╔════════╦════════╦════════╦════════╦════════╗
║ 00: 00 ║ 01: 06 ║ 02: 12 ║ 03: 18 ║ 04: 24 ║
║ 05: 30 ║ 06: 36 ║ 07: 42 ║ 08: 48 ║ 09: 54 ║
║ … ║ … ║ … ║ … ║ … ║
╚════════╩════════╩════════╩════════╩════════╝
level3 (planes: 4; rows: 5; cols: 6)
╔════╦═════╦═════╦═════╦═════╦═════╗
║ 0 ║ 1 ║ 2 ║ 3 ║ 4 ║ 5 ║
║ 6 ║ 7 ║ 8 ║ 9 ║ 10 ║ 11 ║
║ 12 ║ 13 ║ 14 ║ 15 ║ 16 ║ 17 ║ Plane 0
║ 18 ║ 19 ║ 20 ║ 21 ║ 22 ║ 23 ║
║ 24 ║ 25 ║ 26 ║ 27 ║ 28 ║ 29 ║
╠════╬═════╬═════╬═════╬═════╬═════╣
║ 30 ║ 31 ║ 32 ║ 33 ║ 34 ║ 35 ║
║ 36 ║ 37 ║ 38 ║ 39 ║ 40 ║ 41 ║ Plane 1
║ … ║ … ║ … ║ … ║ … ║ … ║
╚════╩═════╩═════╩═════╩═════╩═════╝
With that diagram in place — or a paper and pen version with arrows scrawled over it, the values in the cell of plane p in level1 is p * rows; the values of in the cell of plane p, row r in level2 is p * rows + r) * cols; the values in the cell of plane p, row r, cell c in level3 is (p * rows + r) * cols + c. But the values are not integers; they're pointers. Consequently, the values have to be scaled by an appropriate size and added to the base address for the level1, level2 or level3 space.
That leads to code like this:
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
/* Should be declared in a header for use in other files */
extern void ***malloc3D(size_t planes, size_t rows, size_t cols, size_t sizeOfType);
extern void free3D(void ****matrix);
void ***malloc3D(size_t planes, size_t rows, size_t cols, size_t sizeOfType)
{
void *level3 = malloc(sizeOfType * planes * rows * cols);
void **level2 = malloc(sizeof(void *) * planes * rows);
void ***level1 = malloc(sizeof(void **) * planes);
//printf("planes = %zu; rows = %zu; cols = %zu; ", planes, rows, cols);
//printf("level1 = 0x%.12" PRIXPTR "; level2 = 0x%.12" PRIXPTR "; level3 = 0x%.12" PRIXPTR "\n",
// (uintptr_t)level1, (uintptr_t)level2, (uintptr_t)level3);
fflush(stdout);
if (level3 == NULL || level2 == NULL || level1 == NULL)
{
free(level3);
free(level2);
free(level1);
return NULL;
}
for (size_t plane = 0; plane < planes; plane++)
{
level1[plane] = (void **)((char *)level2 + plane * rows * sizeof(void **));
//printf("level1[%zu] = 0x%.12" PRIXPTR "\n", plane, (uintptr_t)level1[plane]);
for (size_t row = 0; row < rows; ++row)
{
level2[plane * rows + row] = (char *)level3 + (plane * rows + row) * cols * sizeOfType;
//printf(" level2[%zu] = 0x%.12" PRIXPTR "\n",
// plane * rows + row, (uintptr_t)level2[plane * rows + row]);
}
}
return level1;
}
void free3D(void ****matrix)
{
free((*matrix)[0][0]);
free((*matrix)[0]);
free((*matrix));
*matrix = NULL;
}
static void test3D(size_t m3_plns, size_t m3_rows, size_t m3_cols)
{
printf("planes = %zu; rows = %zu; cols = %zu\n", m3_plns, m3_rows, m3_cols);
void ***m3 = malloc3D(m3_plns, m3_rows, m3_cols, sizeof(double));
if (m3 == NULL)
{
fprintf(stderr, "Memory allocation failed for 3D array of size %zux%zux%zu doubles\n",
m3_plns, m3_rows, m3_cols);
return;
}
printf("m3 = 0x%.12" PRIXPTR "; m3[0] = 0x%.12" PRIXPTR "; m3[0][0] = 0x%.12" PRIXPTR "\n",
(uintptr_t)m3, (uintptr_t)m3[0], (uintptr_t)m3[0][0]);
for (size_t i = 0; i < m3_plns; i++)
{
for (size_t j = 0; j < m3_rows; j++)
{
for (size_t k = 0; k < m3_cols; k++)
((double *)m3[i][j])[k] = (i + 1) * 100 + (j + 1) * 10 + (k + 1);
}
}
for (size_t i = 0; i < m3_plns; i++)
{
printf("Plane %zu:\n", i + 1);
for (size_t j = 0; j < m3_rows; j++)
{
for (size_t k = 0; k < m3_cols; k++)
printf("%4.0f", ((double *)m3[i][j])[k]);
putchar('\n');
}
putchar('\n');
}
free3D(&m3);
printf("m3 = 0x%.16" PRIXPTR "\n", (uintptr_t)m3);
}
int main(void)
{
test3D(4, 5, 6);
test3D(3, 4, 10);
test3D(4, 3, 7);
test3D(4, 9, 7);
test3D(30000, 100000, 100000000); /* 2132 PiB - should fail to allocate on sane systems! */
return 0;
}
Example output (with outsize memory allocation):
planes = 4; rows = 5; cols = 6
m3 = 0x7FFCC94027F0; m3[0] = 0x7FFCC9402750; m3[0][0] = 0x7FFCC9402850
Plane 1:
111 112 113 114 115 116
121 122 123 124 125 126
131 132 133 134 135 136
141 142 143 144 145 146
151 152 153 154 155 156
Plane 2:
211 212 213 214 215 216
221 222 223 224 225 226
231 232 233 234 235 236
241 242 243 244 245 246
251 252 253 254 255 256
Plane 3:
311 312 313 314 315 316
321 322 323 324 325 326
331 332 333 334 335 336
341 342 343 344 345 346
351 352 353 354 355 356
Plane 4:
411 412 413 414 415 416
421 422 423 424 425 426
431 432 433 434 435 436
441 442 443 444 445 446
451 452 453 454 455 456
m3 = 0x0000000000000000
planes = 3; rows = 4; cols = 10
m3 = 0x7FFCC94027F0; m3[0] = 0x7FFCC9402750; m3[0][0] = 0x7FFCC9402840
Plane 1:
111 112 113 114 115 116 117 118 119 120
121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140
141 142 143 144 145 146 147 148 149 150
Plane 2:
211 212 213 214 215 216 217 218 219 220
221 222 223 224 225 226 227 228 229 230
231 232 233 234 235 236 237 238 239 240
241 242 243 244 245 246 247 248 249 250
Plane 3:
311 312 313 314 315 316 317 318 319 320
321 322 323 324 325 326 327 328 329 330
331 332 333 334 335 336 337 338 339 340
341 342 343 344 345 346 347 348 349 350
m3 = 0x0000000000000000
planes = 4; rows = 3; cols = 7
m3 = 0x7FFCC94027F0; m3[0] = 0x7FFCC9402750; m3[0][0] = 0x7FFCC9402840
Plane 1:
111 112 113 114 115 116 117
121 122 123 124 125 126 127
131 132 133 134 135 136 137
Plane 2:
211 212 213 214 215 216 217
221 222 223 224 225 226 227
231 232 233 234 235 236 237
Plane 3:
311 312 313 314 315 316 317
321 322 323 324 325 326 327
331 332 333 334 335 336 337
Plane 4:
411 412 413 414 415 416 417
421 422 423 424 425 426 427
431 432 433 434 435 436 437
m3 = 0x0000000000000000
planes = 4; rows = 9; cols = 7
m3 = 0x7FFCC94027F0; m3[0] = 0x7FFCC9402840; m3[0][0] = 0x7FFCC9802000
Plane 1:
111 112 113 114 115 116 117
121 122 123 124 125 126 127
131 132 133 134 135 136 137
141 142 143 144 145 146 147
151 152 153 154 155 156 157
161 162 163 164 165 166 167
171 172 173 174 175 176 177
181 182 183 184 185 186 187
191 192 193 194 195 196 197
Plane 2:
211 212 213 214 215 216 217
221 222 223 224 225 226 227
231 232 233 234 235 236 237
241 242 243 244 245 246 247
251 252 253 254 255 256 257
261 262 263 264 265 266 267
271 272 273 274 275 276 277
281 282 283 284 285 286 287
291 292 293 294 295 296 297
Plane 3:
311 312 313 314 315 316 317
321 322 323 324 325 326 327
331 332 333 334 335 336 337
341 342 343 344 345 346 347
351 352 353 354 355 356 357
361 362 363 364 365 366 367
371 372 373 374 375 376 377
381 382 383 384 385 386 387
391 392 393 394 395 396 397
Plane 4:
411 412 413 414 415 416 417
421 422 423 424 425 426 427
431 432 433 434 435 436 437
441 442 443 444 445 446 447
451 452 453 454 455 456 457
461 462 463 464 465 466 467
471 472 473 474 475 476 477
481 482 483 484 485 486 487
491 492 493 494 495 496 497
m3 = 0x0000000000000000
planes = 30000; rows = 100000; cols = 100000000
alloc3d79(9018,0x7fffa5d79340) malloc: *** mach_vm_map(size=2400000000000000000) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Memory allocation failed for 3D array of size 30000x100000x100000000 doubles

Funny behavior from sscanf

I have a piece of code below which basically reads a text file data.txt and prints to the console. The content of data.txt is below the code listing;
#include "stdio.h"
#define BUFFER_SIZE 93
int main(int argc, char *argv[]){
const char *datafile;
char line[BUFFER_SIZE],*string1;
int a,b,c,d,e;
FILE * File_ptr;
datafile = "data.txt";
File_ptr = fopen(datafile,"r");
if(File_ptr == NULL ){
printf("Error opening file %s\n",datafile);
}
while(fgets(line,BUFFER_SIZE,File_ptr) != 0){
puts(line);
sscanf(line, "%d %d %d %d %d %s", &a,&b,&c,&d,&e,string1);
printf("%d, %d, %d, %d, %d, %s\n",a,b,c,d,e,string1);
}
fclose(File_ptr);
}
Content in data.txt:
100 200 888 456 5443 file1.abc
180 670 812 496 5993 file2.abc
160 230 345 546 5123 file3.abc
23 455 342 235 214 file4.abc
233 5455 3142 2435 1214 file5.abc
What I don't understand is: if the BUFFER_SIZE is defined as < 97, the output would be like this:
100 200 888 456 5443 file1.abc
100 200 888 456 5443 (null)
180 670 812 496 5993 file2.abc
180 670 812 496 5993 (null)
160 230 345 546 5123 file3.abc
160 230 345 546 5123 (null)
23 455 342 235 214 file4.abc
23 455 342 235 214 (null)
233 5455 3142 2435 1214 file5.abc
233 5455 3142 2435 1214 (null)
If the BUFFER_SIZE is defined as 97 ~ 120, the output would be OK, like this:
100 200 888 456 5443 file1.abc
100 200 888 456 5443 file1.abc
180 670 812 496 5993 file2.abc
180 670 812 496 5993 file2.abc
160 230 345 546 5123 file3.abc
160 230 345 546 5123 file3.abc
23 455 342 235 214 file4.abc
23 455 342 235 214 file4.abc
233 5455 3142 2435 1214 file5.abc
233 5455 3142 2435 1214 file5.abc
If the BUFFER_SIZE is defined as >120, a segmentation fault will be triggered at the sscanf() call.
Can someone enlighten me of the reason for this behavior?
Your string1 is an uninitialized pointer that points nowhere. Your sscanf attempts to store data into the location pointed by string1, which is nowhere. Your program exhibits undefined behavior. It can segfault, it can output nonsense, it can do anything. The actual behavior of such program can change for no explainable reasons. This is exactly what you observe.

CipherSaber bug

So I implemented ciphersaber-1. It almost works, I can decrypt the cstest1.cs1. But i have trouble getting cstest2.cs1 to work.
The output is:
The Fourth Amendment to the Constitution of the Unite ▀Stat→s of America
"The right o☻ the people to be secure in their persons, houses, papers, and
effects, against unreasonab→e searches an╚A)┤Xx¹▼☻dcðþÈ_#­0Uc.?n~J¿|,lómsó£k░7╠▄
íuVRÊ ╣├xð"↕(Gû┤.>!{³♫╚Tƒ}Àõ+»~C;ÔÙ²÷g.qÏø←1ß█yÎßsÈ÷g┐ÅJÔÞ┘Îö║AÝf╔ìêâß╗È;okn│CÚê
õ&æÄ[5&Þ½╔s╦Nå1En♂☻♫ôzÓ9»Á╝ÐÅ├ðzÝÎòeØ%W¶]¤▲´Oá╗e_Ú)╣ó0↑ï^☻P>ù♂­¥¯▄‗♦£mUzMצվ~8å
ì½³░Ùã♠,H-tßJ!³*²RóÅ
So I must have a bug in initializing the state. The odd thing is that I can encrypt and decrypt long texts without problems, so the bug is symmetric.
I implemented the rc4 cipher as a reentrent single byte algorithm as you can see in rc4.c.
The state is stored in the rc4_state struct:
typedef unsigned char rc4_byte;
struct rc4_state_
{
rc4_byte i;
rc4_byte j;
rc4_byte state[256];
};
typedef struct rc4_state_ rc4_state;
The state is initialized with rc4_init:
void rc4_init(rc4_state* state, rc4_byte* key, size_t keylen)
{
rc4_byte i, j, n;
i = 0;
do
{
state->state[i] = i;
i++;
}
while (i != 255);
j = 0;
i = 0;
do
{
n = i % keylen;
j += state->state[i] + key[n];
swap(&state->state[i], &state->state[j]);
i++;
}
while (i != 255);
state->i = 0;
state->j = 0;
}
The actual encryption / decryption is done in rc4:
rc4_byte rc4(rc4_state* state, rc4_byte in)
{
rc4_byte n;
state->i++;
state->j += state->state[state->i];
swap(&state->state[state->i], &state->state[state->j]);
n = state->state[state->i] + state->state[state->j];
return in ^ state->state[n];
}
For completeness, swap:
void swap(rc4_byte* a, rc4_byte* b)
{
rc4_byte t = *a;
*a = *b;
*b = t;
}
I have been breaking my head on this for more than two days... The state, at least for the "asdfg" key is correct. Any help would be nice.
The whole thing can be found in my github reopsitory: https://github.com/rioki/ciphersaber/
I stumbled across your question while searching online, but since you haven't updated your code at GitHub yet, I figured you might still like to know what the problem was.
It's in this bit of code:
i = 0;
do
{
state->state[i] = i;
i++;
}
while (i != 255);
After this loop has iterated 255 times, i will have a value of 255 and the loop will terminate. As a result, the last byte of your state buffer is being left uninitialised.
This is easily fixed. Just change while (i != 255); to while (i);.
Sorry you haven't gotten feedback, I finally pulled this off in Python 3 today, but don't know enough about C to debug your code.
Some of the links on the main ciphersaber page are broken (pointing to ".com" instead of ".org"), so you might not have found the FAQ:
http://ciphersaber.gurus.org/faq.html
It includes the following debugging tips:
Make sure you are not reading or writing encrypted files as text files. You must use binary mode for file I/O.
If you are writing in the C language, be sure to store bytes as unsigned char.
Watch out for classic indexing problems. Do arrays in you chosen programming language start with 0 or 1?
Make sure you are writing out a random 10 byte IV when you encrypt and are reading the IV from the start of the file when you decrypt.
If your program still does not work, put in some statements to print out the S array after the key setup step. Then run your program to
decrypt the file cstest1.cs1 using asdfg as the key. Here is how the S
array should look:
file: cstest1.cs1
key: asdfg
176 32 49 160 15 112 58 8 186 19 50 161 60 17 82 153 37 141 131 127 59
2 165 103 98 53 9 57 41 150 174 64 36 62 191 154 44 136 149 158 226
113 230 227 247 155 221 34 125 20 163 95 128 219 1 181 201 146 88 204
213 80 143 164 145 234 134 248 100 77 188 235 76 217 194 35 75 99 126
92 243 177 52 180 83 140 198 42 151 18 91 33 16 192 101 48 97 220 114
110 124 72 139 218 142 118 81 84 31 29 195 68 209 172 200 214 93 240
61 22 206 123 152 7 203 10 119 171 79 250 109 137 199 167 11 104 211
129 208 216 178 207 242 162 30 120 65 115 87 170 47 69 244 212 45 85
73 222 225 185 63 0 179 210 108 245 202 46 96 148 51 173 24 182 89 116
3 67 205 94 231 23 21 13 169 215 190 241 228 132 252 4 233 56 105 26
12 135 223 166 238 229 246 138 239 54 5 130 159 236 66 175 189 147 193
237 43 40 117 157 86 249 74 27 156 14 133 251 196 187 197 102 106 39
232 255 121 122 253 111 90 38 55 70 184 78 224 25 6 107 168 254 144 28
183 71
I also found the "memorable test cases" helpful here:
http://www.cypherspace.org/adam/csvec/
Including:
key="Al"+ct="Al Dakota buys"(iv="Al Dakota "):
pt = "mead"
Even though the memorable test cases require cs2, upgrading to cs2 from cs1 is fairly trivial, you may be able to confidently convert your program to cs2 from cs1 even without fully debugging the rest of it.
Also note that the FAQ claims there used to be a file on the site that wouldn't decode, make sure your target file doesn't begin with "0e e3 f9 b2 40 11 fc 3e ..."
(Though I think that was a smaller test file, not the certificate.)
Oh, and also know that the site's not really up to date on the latest research into RC4 and derivatives. Just reserve this as a toy program unless all else fails.
Python
Here's one I wrote in Python for a question that later got deleted. It processes the file as a stream so memory usage is modest.
Usage
python encrypt.py <key> <rounds> < <infile> > <outfile>
python decrypt.py <key> <rounds> < <infile> > <outfile>
rc4.py
#!/usr/bin/env python
# coding: utf-8
import psyco
from sys import stdin,stdout,argv
def rc4(K):
R=range(256)
S=R[:]
T=bytearray(K*256)[:256]
j=0
for i in R*int(argv[2]):
j=j+S[i]+T[i]&255
S[i],S[j]=S[j],S[i]
i=j=0
while True:
B=stdin.read(4096)
if not B: break
for c in B:
i+=1&255
j=j+S[i]&255
S[i],S[j]=S[j],S[i]
stdout.write(chr(ord(c)^S[S[i]+S[j]&255]))
psyco.bind(rc4)
encrypt.py
from rc4 import *
import os
V=os.urandom(10)
stdout.write(V)
rc4(argv[1]+V)
decrypt.py
from rc4 import *
V=stdin.read(10)
rc4(argv[1]+V)

Resources