Ubuntu C Uart communication lost first byte - c

I do a new summary
,do more tests,and post herehttps://forums.developer.nvidia.com/t/jetson-nx-uart-communiation-lost-first-byte/234877
I write a uart software using c++ on Jetson Nx(Ubuntu OS) to communicate with a PC.
On PC there's a uart simulator to send data at a period of one second,
data is :EB90021112131415161718191A1B1C1D1E1F202122232425262728292A
as show below:
enter image description here
When my software receive the data, sometimes it will lost the first byte "EB", as show like this:
90021112131415161718191A1B1C1D1E1F202122232425262728292A
while sometimes it won't lost the first byte EB.
Besides, I tried a python software to receive data from uart on Jetson Nx, it always works OK.
My c code is as below.
int TestUart(const char * portName, int nSpeed, int nBits, char nEvent, int nStop);
void main(){
TestUart("/dev/ttyTHS0",9600,8,'N',1);
}
int TestUart(const char * portName, int nSpeed, int nBits, char nEvent, int nStop)
{
int Port = open(portName, O_RDWR | O_NOCTTY);
tcflush(Port, TCIFLUSH);
tcflush(Port, TCIOFLUSH);
usleep(500000);
if (Port == -1)
{
printf("UART.cpp : Unable to open port.");
return -1;
}
else
{
printf("UART.cpp : Port opened. Setting Port options...");
}
int fd = Port;
struct termios newtio, oldtio;
if (tcgetattr(fd, &oldtio) != 0)
{
perror("SetupSerial 1");
return -1;
}
bzero(&newtio, sizeof(newtio));
newtio.c_cflag |= CLOCAL | CREAD;
newtio.c_cflag &= ~CSIZE;
switch (nBits)
{
case 7:
newtio.c_cflag |= CS7;
break;
case 8:
newtio.c_cflag |= CS8;
break;
}
switch (nEvent)
{
case 'O':
case 'o':
newtio.c_cflag |= PARENB;
newtio.c_cflag |= PARODD;
break;
case 'E':
case 'e':
newtio.c_cflag |= PARENB;
newtio.c_cflag &= ~PARODD;
break;
case 'N':
case 'n':
newtio.c_cflag &= ~PARENB;
break;
}
cfsetispeed(&newtio,B9600);
cfsetospeed(&newtio,B9600);
newtio.c_cflag &= ~CRTSCTS;
newtio.c_iflag &= ~(IXON | IXOFF | IXANY);
if (nStop == 1)
{
newtio.c_cflag &= ~CSTOPB;
}
else if (nStop == 2)
{
newtio.c_cflag |= CSTOPB;
}
newtio.c_cc[VTIME] = 0;
newtio.c_cc[VMIN] = 1;
newtio.c_lflag &= ~(ICANON | ECHO | ECHOE | ISIG);
newtio.c_oflag &= ~OPOST;
if ((tcsetattr(fd, TCSANOW, &newtio)) != 0)
{
perror("com set error");
return -1;
}
tcflush(fd, TCIFLUSH);
int num = 0;
char rx_data[256];
while(1)
{
int m_curPacket_length = read(Port, rx_data,256);
tcflush(Port, TCIFLUSH);
tcflush(Port, TCIOFLUSH);
printf(" Data %d [port = %s,length = %d]: ",++num,portName,m_curPacket_length);
for (int i = 0; i < m_curPacket_length; i++) {
printf("%X ", rx_data[i]);
}
printf("\n");
}
printf("set UART port paramster done!\n");
return 0;
}
I expect anyone can help me to find the bug.
Thanks a lot!
Besides, when I set non-blocking mode with setting VTIME=0 and VMIN=0, there is no byte loss.
Thanks for your advice!
1 On Jetson Nx, it really have a different result as I mentiond in the problem. May be it's the difference of the handware? I add more hardware info on my main question borad.
This (poorly formated) result does not have any resemblance to the single line that you claim that you get. More importantly there are no "missing" bytes as all 29 bytes of the message are always received and displayed, although there is an extended-sign issue with the first two bytes.
2 Really igore the empty lines, that's just a try.
Again your description of the results that you claim your code produces is inaccurate and misleading. You neglect to mention that your code will produce extraneous text while incessantly polling the system:
3 Change as below, remains the same problem.
bzero(&newtio, sizeof(newtio));
newtio = oldtio;
4 Below changed still no effect.
char rx_data[256];
unsigned char rx_data[256];
5 Below changed will be print as follow:
newtio.c_cc[VTIME] = 0;
newtio.c_cc[VMIN] = 1;
newtio.c_cc[VTIME] = 1;
newtio.c_cc[VMIN] = 29;
print result:
Data 1 [port = /dev/ttyUSB0,length = 57]:90021112131415161718191A1B1C1D1E1F202122232425262728292AEB90021112131415161718191A1B1C1D1E1F202122232425262728292A
Data 2 [port = /dev/ttyUSB0,length = 57]:90021112131415161718191A1B1C1D1E1F202122232425262728292AEB90021112131415161718191A1B1C1D1E1F202122232425262728292A
Still lost the first 'EB'

When my software receive the data, sometimes it will lost the first byte "EB", as show like this:
90021112131415161718191A1B1C1D1E1F202122232425262728292A
while sometimes it won't lost the first byte EB.
...
My c code is as below.
...
Assuming that the Jetson Nx is not horribly slow in comparison to the 9600 baud, executing your code as posted (on a PC) produces the following results for me:
$ ./a.out
UART.cpp : Port opened. Setting Port options... Data 1 [port = /dev/ttyUSB0,length = 1]: FFFFFFEB
Data 2 [port = /dev/ttyUSB0,length = 1]: FFFFFF90
Data 3 [port = /dev/ttyUSB0,length = 1]: 2
Data 4 [port = /dev/ttyUSB0,length = 1]: 11
Data 5 [port = /dev/ttyUSB0,length = 1]: 12
Data 6 [port = /dev/ttyUSB0,length = 1]: 13
Data 7 [port = /dev/ttyUSB0,length = 1]: 14
Data 8 [port = /dev/ttyUSB0,length = 1]: 15
Data 9 [port = /dev/ttyUSB0,length = 1]: 16
Data 10 [port = /dev/ttyUSB0,length = 1]: 17
Data 11 [port = /dev/ttyUSB0,length = 1]: 18
Data 12 [port = /dev/ttyUSB0,length = 1]: 19
Data 13 [port = /dev/ttyUSB0,length = 1]: 1A
Data 14 [port = /dev/ttyUSB0,length = 1]: 1B
Data 15 [port = /dev/ttyUSB0,length = 1]: 1C
Data 16 [port = /dev/ttyUSB0,length = 1]: 1D
Data 17 [port = /dev/ttyUSB0,length = 1]: 1E
Data 18 [port = /dev/ttyUSB0,length = 1]: 1F
Data 19 [port = /dev/ttyUSB0,length = 1]: 20
Data 20 [port = /dev/ttyUSB0,length = 1]: 21
Data 21 [port = /dev/ttyUSB0,length = 1]: 22
Data 22 [port = /dev/ttyUSB0,length = 1]: 23
Data 23 [port = /dev/ttyUSB0,length = 1]: 24
Data 24 [port = /dev/ttyUSB0,length = 1]: 25
Data 25 [port = /dev/ttyUSB0,length = 1]: 26
Data 26 [port = /dev/ttyUSB0,length = 1]: 27
Data 27 [port = /dev/ttyUSB0,length = 1]: 28
Data 28 [port = /dev/ttyUSB0,length = 1]: 29
Data 29 [port = /dev/ttyUSB0,length = 1]: 2A
This (poorly formated) result does not have any resemblance to the single line that you claim that you get.
More importantly there are no "missing" bytes as all 29 bytes of the message are always received and displayed, although there is an extended-sign issue with the first two bytes.
Besides, when I set non-blocking mode with setting VTIME=0 and VMIN=0, there is no byte loss.
Again your description of the results that you claim your code produces is inaccurate and misleading. You neglect to mention that your code will produce extraneous text while incessantly polling the system:
UART.cpp : Port opened. Setting Port options... Data 1 [port = /dev/ttyUSB0,length = 0]:
Data 2 [port = /dev/ttyUSB0,length = 0]:
Data 3 [port = /dev/ttyUSB0,length = 0]:
Data 4 [port = /dev/ttyUSB0,length = 0]:
Data 5 [port = /dev/ttyUSB0,length = 0]:
Data 6 [port = /dev/ttyUSB0,length = 0]:
Data 7 [port = /dev/ttyUSB0,length = 0]:
Data 8 [port = /dev/ttyUSB0,length = 0]:
Data 9 [port = /dev/ttyUSB0,length = 0]:
Data 10 [port = /dev/ttyUSB0,length = 0]:
Data 11 [port = /dev/ttyUSB0,length = 0]:
Data 12 [port = /dev/ttyUSB0,length = 0]:
Data 13 [port = /dev/ttyUSB0,length = 0]:
Data 14 [port = /dev/ttyUSB0,length = 0]:
Data 15 [port = /dev/ttyUSB0,length = 0]:
Data 16 [port = /dev/ttyUSB0,length = 0]:
Data 17 [port = /dev/ttyUSB0,length = 0]:
Data 18 [port = /dev/ttyUSB0,length = 0]:
Data 19 [port = /dev/ttyUSB0,length = 0]:
Data 20 [port = /dev/ttyUSB0,length = 0]:
Data 21 [port = /dev/ttyUSB0,length = 0]:
Data 22 [port = /dev/ttyUSB0,length = 0]:
...
How do manage to sort out the text that indicates received data versus the garbage of no input?
Your initialization is flawed because a zeroed-out termios struct is used. See Setting Terminal Modes Properly.
I have tried not zeroed-out termios struct is used, still the same problem remains.
if (tcgetattr(fd, &newtio) != 0) {
perror("SetupSerial 1");
return -1;
}
//bzero(&newtio, sizeof(newtio));
You've done a poor job of following my advice, since you clearly did not bother to study the linked guide.
By simply deleting the bzero() statement, you are now using an uninitialized structure, which could be a worse bug and should have generated a compiler warning.
The proper correction to your code is (using patch notation):
- bzero(&newtio, sizeof(newtio));
+ newtio = oldtio;
Your userspace program is not synchronized with UART I/O, so flushing system buffers (discarding I/O data) is prone to inadvertent loss of real data. IOW you're misusing the flush directive
and when no flush as show below, still remains the problem
In turns out, despite the flaws in your code, I can use your original code to consistently receive the messages of 29 bytes on my PC. Although the flaws do not cause any observed misbehavior currently, they must be corrected for reliable and robust code.
I am unable to understand what you are calling "missing bytes". Is the sign-extended representation of the first two bytes the problem?
That is simply corrected by using the appropriate type for the receive buffer:
- char rx_data[256];
+ unsigned char rx_data[256];
Given that your system is receiving a message of 29 bytes each second, you can improve the efficiency of each read() syscall by trying to fetch a full message. Try changing the following termios attributes:
- newtio.c_cc[VTIME] = 0;
- newtio.c_cc[VMIN] = 1;
+ newtio.c_cc[VTIME] = 1;
+ newtio.c_cc[VMIN] = 29;
The program can then read a complete message per syscall:
$ ./a.out
UART.cpp : Port opened. Setting Port options...
Data 1 [port = /dev/ttyUSB0,length = 29]: EB 90 2 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A
Data 2 [port = /dev/ttyUSB0,length = 29]: EB 90 2 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A

Related

Output of an unsigned char array [duplicate]

This question already has answers here:
Endianness -- why do chars put in an Int16 print backwards?
(4 answers)
Closed 3 years ago.
I have following code:
int main ( void ) {
unsigned int array [] = { 298 , 0 x1A2A3A4A };
unsigned char *p = ( unsigned char *) array ;
for (int i = 4; i < 8; i ++) {
printf ("% hhX ", p[i]) ;
}
printf ("\ nThe Answer is %d or %d!\n", p[0] , p [6]) ;
return EXIT_SUCCESS ;
}
I dont understand the output:
4A 3A 2A 1A
The Answer is 42 or 42
On a little endian system, the layout of the 8 byte array is
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
Value (Hex): 2A | 01 | 00 | 00 | 4A | 3A | 2A | 1A
Your loop prints the last 4 bytes in the order they appear in the array. Your final printf prints the values at the 0 and 6 position, which are both 0x2A, or 42 in decimal.

Concatenate 5 or more bytes and convert to decimal and then to ASCII

I've this array below:
dataIn[5] = 0x88;
dataIn[6] = 0x2A;
dataIn[7] = 0xC7;
dataIn[8] = 0x2B;
dataIn[9] = 0x00;
dataIn[10] = 0x28;
I need to convert those values to decimal because after that I need to convert the decimal values into ASCII and send to UART.
Eg:
| Hexa | Decimal | ASCII (I need to send this data to UART)
| 0x882AC72B00 | 584 833 248 000 | 35 38 34 38 33 33 32 34 38 30 30 30
| 0x5769345612 | 375 427 192 338 | 33 37 35 34 32 37 31 39 32 33 33 38
My problem: Those data should put all together and convert to decimal, but my compiler is just for 4 bytes and I don't know how to do this because I've 5 or more bytes ever.
Ps.: I'm using PIC18F46K80 and C18 compiler
[Edited]
Click here to see what happen when I try to use more than 4 bytes. This is my problem
Anyone could help me ?
Thanks in advance.
If I have understood well, first of all you should define a union like this:
typedef union _DATA64
{
uint64_t dataIn64;
uint8_t dataIn8[8];
}tu_DATA64;
and then copy the hex values in the previous defined union:
uint8_t i;
tu_DATA64 data;
...
data.dataIn64=0;
for(i=0; i<5; i++)
data.dataIn8[4-i]=dataIn[i];
now you have to convert the 64bit variable in a string using lltoa function, like suggested in this post:
char *str;
...
str=lltoa(data.dataIn64,10);
The str is the buffer string to send.
Have you considered writing your own conversion function? Here's a working example that can be adjusted to any length.
WARNING: My C skills are not the best!
#include <stdio.h>
/******************************************************************************/
void base10_ascii(unsigned char data[], int data_size, char ans[], int ans_size) {
char done;
do {
char r = 0;
done = 1;
for (int i=0; i<data_size; i++) {
int b = (r<<8) + data[i]; //previous remainder and current byte
data[i] = b / 10;
if (data[i] > 0) done = 0; //if any digit is non-zero, not done yet
r = b % 10;
}
for (int i=ans_size-1; i>0; i--) ans[i] = ans[i-1]; //bump up result
ans[0] = r + '0'; //save next digit as ASCII (right to left)
} while (!done);
}
/******************************************************************************/
int main(){
char outputBuffer[15] = {0};
char data[] = { 0x88, 0x2A, 0xC7, 0x2B, 0x00 }; //584833248000
base10_ascii(data,sizeof data,outputBuffer,sizeof outputBuffer);
printf("Output: %s\n",outputBuffer);
return 0;
}

STM32F100RBT6B usart data deformation

I have on radio transmitter and one radio receiver like on the picture. What I am trying to do.
I send data through USART2 PA2 (pin2) on data transmitter.
I connected usb-uart to PA2 (pin2), and looked in terminal what data is there and I saw something like this (11 22 33 44 11 22 33 44 11 22 33 44 11 22 33 44 11 22 33 44 11 22 33 44 11 22 33 44 11 22 33 44 11 22 33 44 ).It's okay i transmit what i want.
Then i connected usb-uart to PA10 (pin10) radio receiver, and looked in terminal, there was a lot of data but mostly of them was (44 FD 11 22 33 44 7F 10 22 33 44 FE 11 22 33 44 00 00 00 A4 26 F4 F8 11 22 33 44 FE 11 22 33 44 FA 10 22 33 44 FF 11 22 33 44 FE 11 22 33 44 D6 11 22 33 44 ).It's okay i receive from receiver module data that was transmitted and something else....
Then I connected usb-uart to PA9 (pin9), and looked in terminal, there was only (11 33 44 FC 11 33 44 FF 10 33 44 FF 11 33 44 FE ).
When USART1 receiving data from transmitter(Pin10)(44 FD 11 22 33 44 7F 10 ) it receiving full sequence(11 22 33 44), but when it transmit to Pin9 i see only (11 33 44) in usb-uart.
Where is one byte (22)? Where is error and why it's happening thanks!
The main question what I am doing wrong, why USART1 TX deformates data? Or what? How to fix this?
#include "stm32f10x.h"
#include "stm32f10x_gpio.h"
#include "stm32f10x_rcc.h"
#define SYNC 0xAA
#define RADDR 0x44
#define LEDON 0x11//switch led on command
#define LEDOFF 0x22//switch led off command
void Delay(uint32_t value)
{
volatile uint32_t i;
for (i = 0; i != value; i++)
;
}
void send_to_uart1(uint8_t data)
{
while(!(USART1->SR & USART_SR_TC))
{
}
USART1->DR=data;
}
void send_to_uart2(uint8_t data)
{
while (!(USART2->SR & USART_SR_TXE));
USART2->DR = data;
}
uint8_t GetChar (void) {
while (!(USART1->SR & USART_SR_RXNE));
return ((USART1->DR));
}
uint8_t uart_data;
int k=0;
char message[4]={0x11,0x22,0x33,0x44};
int main(void)
{
GPIO_InitTypeDef PORTA_init_struct;
RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA | RCC_APB2Periph_USART1, ENABLE);
RCC_APB1PeriphClockCmd(RCC_APB1Periph_USART2, ENABLE);
PORTA_init_struct.GPIO_Pin = GPIO_Pin_9;
PORTA_init_struct.GPIO_Speed = GPIO_Speed_2MHz;
PORTA_init_struct.GPIO_Mode = GPIO_Mode_AF_PP;
GPIO_Init(GPIOA, &PORTA_init_struct);
PORTA_init_struct.GPIO_Pin = GPIO_Pin_10;
PORTA_init_struct.GPIO_Speed = GPIO_Speed_2MHz;
PORTA_init_struct.GPIO_Mode = GPIO_Mode_IN_FLOATING;
GPIO_Init(GPIOA, &PORTA_init_struct);
PORTA_init_struct.GPIO_Pin = GPIO_Pin_2;
PORTA_init_struct.GPIO_Speed = GPIO_Speed_2MHz;
PORTA_init_struct.GPIO_Mode = GPIO_Mode_AF_PP;
GPIO_Init(GPIOA, &PORTA_init_struct);
PORTA_init_struct.GPIO_Pin = GPIO_Pin_3;
PORTA_init_struct.GPIO_Speed = GPIO_Speed_2MHz;
PORTA_init_struct.GPIO_Mode = GPIO_Mode_IN_FLOATING;
GPIO_Init(GPIOA, &PORTA_init_struct);
USART_InitTypeDef uart_struct1;
uart_struct1.USART_BaudRate = 1200;
uart_struct1.USART_WordLength = USART_WordLength_8b;
uart_struct1.USART_StopBits = USART_StopBits_1;
uart_struct1.USART_Parity = USART_Parity_No;
uart_struct1.USART_HardwareFlowControl = USART_HardwareFlowControl_None;
uart_struct1.USART_Mode = USART_Mode_Rx | USART_Mode_Tx;
USART_Init(USART1, &uart_struct1);
USART_Cmd(USART1, ENABLE);
USART_InitTypeDef uart_struct2;
uart_struct2.USART_BaudRate = 1200;
uart_struct2.USART_WordLength = USART_WordLength_8b;
uart_struct2.USART_StopBits = USART_StopBits_1;
uart_struct2.USART_Parity = USART_Parity_No;
uart_struct2.USART_HardwareFlowControl = USART_HardwareFlowControl_None;
uart_struct2.USART_Mode = USART_Mode_Rx | USART_Mode_Tx;
USART_Init(USART2, &uart_struct2);
USART_Cmd(USART2, ENABLE);
while (1) {
while(k<4)
{
for (int j=0;j<4;j++)
{
send_to_uart2(message[k]);
k=k+1;
}
for (int i=0;i<4;i++)
{
send_to_uart1(GetChar());
}
}
k=0;
}
}

(C) how to fix this algorithm for z827 ASCII compression?

noob warning.
I'm trying to create a compression program. It takes a .txt with ASCII characters as an argument, and cuts off the leading 0 of the binary representation of each character.
It does this by using the last 2 bytes of two different integers. A character with a leading zero is put into the 4th byte of the integer 'write', and the next character is put into the 3rd byte of the integer 'temp'. The 'temp' int is then shifted to the right once, and then OR'd with 'write', so that the leading zero slot has been filled with data we need. This repeats, with the shift counter increasing after every character. The first case is a bit odd. The algorithm isn't very complex if written out on paper.
I feel like I've tried everything. I've been over the algorithm so many times. I'm pretty sure the problem is when shift_counter gets to 8.. but it should work fine. It just doesn't. I can show you why here (the code is further down):
This is the hex dump of my output:
0000000 3f 00 00 00 41 10 68 9e 6e c3 d9 65 10 88 5e c6
0000020 d3 41 e6 74 9a 5d 06 d1 df a0 7a 7d 5e 06 a5 dd
0000040 20 3a bd 3c a7 a7 dd 67 10 e8 5d a7 83 e8 e8 72
0000060 19 a4 c7 c9 6e a0 f1 f8 dd 86 cb cb f3 f9 3c
0000077
And the correct output:
0000000 3f 00 00 00 41 d0 3c dd 86 b3 cb 20 7a 19 4f 07
0000020 99 d3 ec 32 88 fe 06 d5 e7 65 50 da 0d a2 97 e7
0000040 f4 b4 fb 0c 7a d7 e9 20 3a ba 0c d2 e3 64 37 d0
0000060 f8 dd 86 cb cb f3 79 fa ed 76 29 00 0a 0a
0000076
code:
int compress(char *filename_ptr){
int in_fd;
in_fd = open(filename_ptr, O_RDONLY);
//set pointer to the end of the file, find file size, then reset position
//by closing/opening
unsigned int file_bytes = lseek(in_fd, 0, SEEK_END);
close(in_fd);
in_fd = open(filename_ptr, O_RDONLY);
//store file contents in buffer
unsigned char read_buffer[file_bytes];
read(in_fd, read_buffer, file_bytes);
//file where the output will be stored
int out_fd;
creat("output.txt", 0644);
out_fd = open("output.txt", O_WRONLY);
//sets file size in header (needed for decompression, this is the size of the
//file before compression. everything after this we write this 4-byte int
//is a 1 byte char
write(out_fd, &file_bytes, 4);
unsigned int writer;
unsigned int temp;
unsigned char out_char;
int i;
int shift_count = 8;
for(i = 0; i < file_bytes; i++){
if(shift_count == 8){
writer = read_buffer[i];
temp = temp & 0x00000000;
temp = read_buffer[i+1] << 8;
shift_count = 1;
}else{
//moves the next char's bits to the left, for the purpose of filling the
//8 bit buffer (writer) via OR operation
temp = read_buffer[i] << 8;
}
temp = temp >> shift_count;
writer = writer | temp;
//output right byte of writer
unsigned int right_byte = writer & 0x000000ff;
//output right_byte as a char
out_char = (char) right_byte;
//write_buffer[i] = out_char;
write(out_fd, &out_char, 1);
//clear right side of writer
writer = writer & 0x0000ff00;
//shift left side of writer to the right by 8
writer = writer >> 8;
shift_count++;
}
return 0;
}
It seems to me that input and output are too strongly coupled.
At some point, the program should be reading (roughly) the 80th octet from the input and writing (roughly) the 70th octet to the output, because you want to (on average) write 7 bits out for every 8 bits you read in, right?
What the loop
for(i = 0; i < file_bytes; i++){
...
... = read_buffer[i];
...
write(out_fd, &out_char, 1);
...
}
actually seems to be doing is:
On the 70th pass through the loop -- when 70==i --
it's reading the 70th octet from the input and writing the 70th octet to the output.
On the 80th pass through the loop -- when 80==i --
it's reading the 80th octet from the input and writing the 80th octet to the output.
You must decide:
Do you want "i" to represent the number of input characters processed, or the number of output chars processed?
Because it's not possible to do both -- it's not possible to have 70 equal 80.
Perhaps something like this is closer to what you wanted:
/* test.c
http://stackoverflow.com/questions/15080239/c-how-to-fix-this-algorithm-for-z827-ascii-compression
WARNING: untested code.
*/
int compress(char *filename_ptr){
int in_fd;
in_fd = open(filename_ptr, O_RDONLY);
//set pointer to the end of the file, find file size, then reset position
//by closing/opening
unsigned int file_bytes = lseek(in_fd, 0, SEEK_END);
close(in_fd);
in_fd = open(filename_ptr, O_RDONLY);
//store file contents in buffer
unsigned char read_buffer[file_bytes];
read(in_fd, read_buffer, file_bytes);
//file where the output will be stored
int out_fd;
creat("output.txt", 0644);
out_fd = open("output.txt", O_WRONLY);
//sets file size in header (needed for decompression, this is the size of the
//file before compression. everything after this we write this 4-byte int
//is a 1 byte char
write(out_fd, &file_bytes, 4);
unsigned int writer;
unsigned int temp;
unsigned char out_char;
int i;
int writer_bits = 0; // 0 bits of data in writer so far
for(i = 0; i < file_bytes; i++){
// i is the number of (7 bit ASCII) characters
// read from the input so far.
// add 7 more bits to the writer
temp = read_buffer[i];
//moves the next char's bits to the left, for the purpose of filling the
//8 bit buffer (writer) via OR operation
//(avoid overwriting the "writer_bits" of good bits
//already in the buffer).
temp = read_buffer[i] << writer_bits;
writer = writer | temp;
writer_bits = writer_bits + 7;
//output right byte of writer
unsigned int right_byte = writer & 0x000000ff;
//output right_byte as a char
out_char = (unsigned char) right_byte;
// output 8 bits of data whenever
// we have *at least* 8 bits of data in the writer buffer.
if(writer_bits >= 8){
//write_buffer[i] = out_char;
write(out_fd, &out_char, 1);
//shift left side of writer to the right by 8
writer = writer >> 8;
writer_bits = writer_bits - 8;
}else{
// 7 or fewer bits in writer --
// skip writing until next time.
}
}
// is there any leftover bits still in writer?
if(writer_bits > 0){
//write_buffer[i] = out_char;
write(out_fd, &out_char, 1);
}
return 0;
}
(Currently the program reads the entire input file into RAM, then writes the entire output file. Some programmers prefer to read a little at a time, then write a little at a time. Both approaches have advantages and disadvantages).

Removing bytes in a dump or utf-8 in c

I have a "C" program in my firestation that captures incoming packets to the station printer. The program then scans the packet and sends and audible alert for what apparatus is due on the call. The county recently started using UTF-8 packets and the c program can not deal with all the extra "00" in the data flow. I need to either ignore the 00 or set the program to handle UTF-8. I have looked for days and there is nothing concrete on how to handle utf-8 that a novice such as my self can handle. Below is the interpret part of the program.
72 00 65 00 61 00 74 00 68 00 69 00 6e 00 67 00 later in packet
43 4f 44 45 53 45 54 3d 55 54 46 38 0a 40 50 4a beginning of packet
***void compressUtf16 (char *buff, size_t count) {
int i;
for (i = 0; i < count; i++)
buff[i] = buff[i*2]; // for xx 00 xx 00 xx 00 ...
}*
{ u_int i=0;
char *searcher = 0;
char c;
int j;
int locflag;
static int locationtripped = 0;
static char currentline[256];
static int currentlinepos = 0;
static char lastdispatched[256];
static char dispatchstring[256];
char betastring[256];
static int a = 0;
static int e = 0;
static int pe = 0;
static int md = 0;
static int pulse = 0;
static char location[128];
static char type[16];
static char station[16];
static FILE *fp;
static int printoutscanning = 0;
static char printoutID[20];
static char printoutfileID[32];
static FILE *dbg;
if(pulse) {
if(pulse == 80) {
sprintf(betastring, "beta a a a");
printf("betastring: \"%s\"\n", betastring);
system(betastring);
pulse = 0;
} else
pulse++;
}
if(header->len > 96) {
for(i=55; (i < header->caplen + 1 ) ; i++) {
c = pkt_data[i-1];
if(c == 13 || c == 10) {
currentline[currentlinepos] = 0;
currentlinepos = 0;
j = strlen(currentline);
if(j && (j > 1)) {
if(strlen(printoutfileID) && printoutscanning) {
dbg = fopen(printoutfileID, "a");
fprintf(dbg, "%s\n", currentline);
fclose(dbg);
}
if(!printoutscanning) {
searcher = 0;
searcher = strstr(currentline, "INCIDENT HISTORY DETAIL:");
if(searcher) {
searcher = searcher + 26;
strncpy(printoutID, searcher, 9);
printoutID[9] = 0;
printoutscanning = 1;
a = 0;
e = 0;
pe = 0;
md = 0;
for(j = 0; j < 128; j++)
location[j] = 0;
for(j = 0; j < 16; j++) {
type[j] = 0;
station[j] = 0;
}
sprintf(printoutfileID, "calls/%s %.6d.txt", printoutID, header-> ts.tv_usec);
dbg = fopen(printoutfileID, "a");
fprintf(dbg, "%s\n", currentline);
fclose(dbg);
}
UTF-8, except for the zero code point itself, will not have any zero bytes in it. The first byte of all multi-byte encodings (non-ASCII code points) will always start with the 11 bit pattern, with subsequent bytes always starting with the 10 bit pattern.
As you can see from the following table, U+0000 is the only code point that can give you a zero byte in UTF-8.
+----------------+----------+----------+----------+----------+
| Unicode | Byte 1 | Byte 2 | Byte 3 | Byte 4 |
+----------------+----------+----------+----------+----------+
| U+0000-007F | 0xxxxxxx | | | |
| U+0080-07FF | 110yyyxx | 10xxxxxx | | |
| U+0800-FFFF | 1110yyyy | 10yyyyxx | 10xxxxxx | |
| U+10000-10FFFF | 11110zzz | 10zzyyyy | 10yyyyxx | 10xxxxxx |
+----------------+----------+----------+----------+----------+
UTF-16 will intersperse zero bytes between your otherwise ASCII bytes but it's then a simple matter of throwing away every second byte. Whether that's 0, 2, 4, ... or 1, 3, 5, ... depends on whether your UTF-16 encoding is big-endian or little-endian.
I see from your sample that your data stream does indicate UTF-8 (43 4f 44 45 53 45 54 3d 55 54 46 38 translates to the text CODESET=UTF8) but I'll guarantee you it's lying :-)
The segment 72 00 65 00 61 00 74 00 68 00 69 00 6e 00 67 00 is UTF-16 for reathing, presumably a word segment since I'm not familiar with that word (in English, anyway).
I would suggest you clarify with whoever is generating that data since it's clearly erroneous. As to how you process the UTF-16, I've covered that above. Provided it's ASCII data in there (the alternate bytes are always zero), you can just throw away those alternates with something like:
// Process a UTF16 buffer containing ASCII-only characters.
// buff is the buffer, count is the quantity of UTF-16 chars.
// Will change buffer.
void compressUtf16 (char *buff, size_t count) {
int i;
for (i = 0; i < count; i++)
buff[i] = buff[i*2]; // for xx 00 xx 00 xx 00 ...
}
And, if you're using the other endian UTF-16, simply change:
buff[i] = buff[i*2]; // for xx 00 xx 00 xx 00 ...
into:
buff[i] = buff[i*2+1]; // for 00 xx 00 xx 00 xx ...

Resources