array storage and multiplication in verilog - arrays

I have a peripheral connected to my altera fpga and am able read data from it using SPI. I would like to store this incoming data into an array, preferably as a floating point value. Further, I have a csv file on my computer and want to store that data in another array, and then after triggering a 'start' signal multiply both arrays and send the output via rs-232 to my pc. Any suggestions on how to go about this? Code for reading data from peripheral is as follows:
// we sample on negative edge of clock
always #(negedge SCL)
begin
// data comes as MSB first.
MOSI_reg[31:0] <= {MOSI_reg[30:0], MOSI}; // left shift for MOSI data
MISO_reg[31:0] <= {MISO_reg[30:0], MISO}; // left shift for MISO data
end
thank you.

A 1024x28 matrix of 32 bits each element requires 917504 bits of RAM in your FPGA, plus another 28*32 = 896 bits for the SPI data. Multiplying these two matrices will result in a vector of 1024x1 elements, thus add 32768 bits for the result. This sums 951168 bits you will need in your device. Does your FPGA chip have this memory?
Asumming you have, yes: you can instantiate a ROM inside your design and initialize with $readmemh or $readmemb (for values in binary or hexadecimal form respectively).
If precission is not an issue, go for fixed point, as implementing multiplication and addition in floating point is kind of hard job.
You need then a FSM to fill your source vector with SPI data, do the multiplication and store the result in your destination vector. You may consider instantiating a small processor to do the job more easily

Multiplication is non-trivial in hardware, and 'assign c = a*b' is not necessarily going to produce what you want.
If your FPGA has DSP blocks, you can use one of Altera's customizable IP cores to do your multiplication in a DSP block. If not, you can still use an IP core to tune the multiplier the way you want (with regards to signed/unsigned, latency, etc.) and likely produce a better result.

Related

Array vs long vectors in Verilog

I am writing a Verilog code that needs to hold many data in a memory-like structure.
I have implemented them using both an array of vectors and a really long single vector.
Although I don't think there is any difference internally, which, if there is, is a better way of storing data?
I'm actually writing a code that will synthesize onto a board, so any practical advice from those who've had a lot of experience with FPGA will help.
For example, I could store 32x1024 data using
reg[31:0] temp_storage [0:1023]
or
reg[32767:0] temp_storage
The array method is much easier for the programmer to manage, but is there any disadvtange from the perspective of the hardware?
Would it actually be the same if I declared everything one by one?
reg[31:0] temp_storage0001
reg[31:0] temp_stoarge0002
.
.
.
reg[31:0] temp_storage1024
Thank you.
There is a big difference between these two formats.
Format one:
reg[31:0] temp_storage [0:1023];
This is can be mapped* on a memory block and as such will use significant less FPGA registers/recources. But there is penalty: you can read or write maximum two entries at a time.
(Two entries if you use dual ported memory, one if you use single ported memory. All embedded block rams in FPGAs are dual ported these days)
Format two:
reg[31:0] temp_storage0001
reg[31:0] temp_stoarge0002
In this case every temp_storage... is stored in a separate set of 32 registers. You can access as many as you like simultaneous reading and writing (or as much until you run out of FPGA gates). Thus the flexibility is much larger but this will use up you FPGA gates/LUTS much faster.
* As #B.Go says: check your FPGA documentation how exactly you get this mapped onto memory, or infer memory macro/IP directly.
What do you exactly mean by maximum of two entries: temp_storage[0] <= t_data; temp_storage[1] <= t_data.
reg[31:0] temp_storage [0:1023];
The above definition is for a memory which has 1024 entries each 32 bits wide. You can select two entries of the 1024 and you can read or you can write each entry. (Your dual-ported memory normally has two address buses, two read data ports and two write data ports and often also two clocks, on per port).
You would normally access this memory using something like:
always #( posedge clk)
begin
if (write_enable_0)
temp_storage[address_0] <= write_data_0;
else
read_data_0 <= temp_storage[address_0];
if (write_enable_1)
temp_storage[address_1] <= write_data_1;
else
read_data_1 <= temp_storage[address_1];
end
Find the 'memory' section /application note of your FPGA family which will tell you how to do thus and also the pitfalls (e.g. writing and reading at the same time from the same location)

Do a 8-bit write to a 32-bit register

I'm trying to read the clock source value of a given generic clock generator on the Samd21 MCU.
The datasheet says, that if I wish to read the GENCTRL register (containing the clock source value), I need to "do a 8-bit write" and read the register afterwards. How can I do that, given that the register is 32-bit?
I'm afraid, that by doing the following, I am actually changing the generic clock generator X's configuration:
GCLK->GENCTRL.reg = GCLK->GENCTRL.reg & 0xFFFFFFF0 | 0x0000000X
Keep in mind, that the lower 8-bits of GENCTRL are reserved for the generic clock generator's ID.
Bellow is part of the datasheet containing the instructions for reading the GENCTRL register.
The ARM registers are 32 bit. The peripheral registers (in general) will be arranged at 4 byte offsets, but will not always implement all 32 bits that this implies.
This is most obvious when the upper bits of a peripheral register are 'read as zero, write ignored'. You might occasionally see a newer or more featured version of the peripheral where some of these unused bits become used in the future.
Depending on exactly how a specific peripheral is connected to the core it is generally possible to perform byte, half-word or word accesses to any region of memory. Provided this is supported, only the relevant bytes will be updated. Where there is a restriction (for example a 32 bit APB bus where only byte access is supported), this should be clearly identified in the documentation. With a AA64 processor, it is even possible to write two registers at once!
Do note that the peripheral 'knows' the access size (at least the information is present on the internal bus), so it is possible to specify different behaviour for a byte access as a word (even if this is the sort of confusing behaviour that is best avoided). To generalise, any memory mapped peripheral is more of an observer of the bus than a true implementation of memory - the designer is free to play tricks with the full address/data/control bus bit combinations, and implement bitmasks, read/modify/write, access locks, magic values, etc.

RGB video ADC Conversion Color Palletes

I'm trying to better understand analog to digital video conversion and was hoping for some direction. Way I understand it, a dedicated 10-bit ADC chip will read the voltage of R, G, and B input pins, translate this to 10-bit RGB and output in parallel these value across 30-pins. (Ignoring sync/clock pins etc). My question however is this: If you know the source only has 5-bits per color, (2^5)^3 = 32,768 colors, dumps this to analog RGB, and you are using a 10-bit ADC, will the ADC interpolate colors due to voltage variances and the increase from 5 to 10 bits, thus introducing unoriginal/unintended colors, or is the sampling of analog to digital truly so precise the original source color pallet will be preserved correctly?
Most ADCs have a 1-LSB precision, so the lowest bit will toggle randomly anyway. If you need it to be stable, either use oversampling with increased frequency, or use a 12 bit ADC, this one will have an LSB toggling as well, but bit 2 will be probably stable.
Why probably you ask? Well, if you transmission line is noisy or badly coupled, it can introduce additional toggling in LSB range, or even higher. In some bad cases noise can even corrupt your higher 5 bits of data.
There might be some analog filters / ferrite beads / something else to smoothen your signal as well, so you won't even see actual "steps" on analog.
So, you never know until you test it. Try looking at your signal with a scope, that might solve some of your doubts.

Combine two bytes from gyroscope into signed angular rate

I've got two 8-bit chars. They're the product of some 16-bit signed float being broken up into MSB and LSB inside a gyroscope.
The standard method I know of combining two bytes is this:
(signed float) = (((MSB value) << 8) | (LSB value));
Just returns garbage.
How can I do this?
Okay, so, dear me from ~4 years ago:
First of all, the gyroscope you're working with is a MAX21000. The datasheet, as far as future you can see, doesn't actually describe the endianness of the I2C connection, which probably also tripped you up. However, the SPI connection does state that the data is transmitted MSB-first, with the top 8-bits of the axis data in the first byte, and the additional 8 in the next.
To your credit, the datasheet doesn't really go into what type those 16 bits represent - however, that's because it's standardized across manufacturers.
The real reason why you got such meaningless values when converting to float is that the gyro isn't sending a float. Why'd you even think it would?
The gyro sends a plain 'ol int16 (short). A simple search for "i2c gyro interface" would have made that clear. How do you get that into a decimal angular rate? You divide by 32,768 (int16's max positive value), then multiply by the full-scale range set on the gyro.
Simple! Here, want a code example?
float X_angular_rate = ((((int16_t)((byte_1 << 8) | byte_2))/SHRT_MAX)*GYRO_SCALE
However, I think that it's important to note that the data from these gyroscopes alone is not, in itself, as useful as you thought; to my current knowledge, due to their poor zero-rate drift characteristics, MEMS gyros are almost always used in a sensor fusion setup with an accelerometer and a Kalman filter to make a proper IMU.
Any position and attitude derived from dead-reckoning without this added complexity is going to be hopelessly inaccurate after mere minutes, which is why you added an accelerometer to the next revision of the board.
You have shown two bytes, and float is 4 bytes on most systems. What did you do with the other two bytes of the original float you deconstructed? You should preserve and re-construct all four original bytes if possible. If you can't, and you have to omit any bytes, set them to zero, and make them the least significant bits in the fractional part of the float and hopefully you'll get an answer with satisfactory precision.
The diagram below shows the bit positions, so acting in accordance with the endianness of your system, you should be able to construct a valid float based on how you deconstructed the original. It can really help to write a function to display values as binary numbers and line them up and display initial, intermediate and end results to ensure that you're really accomplishing what you think (hope) you are.
To get a valid result you have to put something sensible into those bits.

Storing Large Integers/Values in an Embedded System

I'm developing a embedded system that can test a large numbers of wires (upto 360) - essentially a continuity checking system. The system works by clocking in a test vector and reading the output from the other end. The output is then compared with a stored result (which would be on an SD Card) that tells what the output should have been. The test-vectors are just a walking ones so there's no need to store them anywhere. The process would be a bit like follows:
Clock out test-vector (walking ones)
Read in output test-vector.
Read corresponding output test-vector from SD Card which tells what the output vector should be.
Compare the test-vectors from step 2 and 3.
Note down the errors/faults in a separate array.
Continue back to step 1 unless all wires are checked.
Output the errors/faults to the LCD.
My hardware consists of a large shift register thats clocked into the AVR microcontroller. For every test vector (which would also be 360 bits), I will need to read in 360 bits. So, for 360 wires the total amount of data would be 360*360 = 16kB or so. I already know I cannot do this in one pass (i.e. read the entire data and then compare), so it will have to be test-vector by test-vector.
As there are no inherent types that can hold such large numbers, I intend to use a bit-array of length 360 bit. Now, my question is, how should I store this bit array in a txt file?
One way is to store raw values i.e. on each line store the raw binary data that I read in from the shift register. So, for 8 wires, it would be 0b10011010. But this can get ugly for upto 360 wires - each line would contain 360 bytes.
Another way is to store hex values - this would just be two characters for 8 bits (9A for the above) and about 90 characters for 360 bits. This would, however, require me to read in the text - line by line - and convert the hex value to be represented in the bit-array, somehow.
So whats the best solution for this sort of problem? I need the solution to be completely "deterministic" - I can't have calls to malloc or such. They are a bit of a no-no in embedded systems from what I've read.
SUMMARY
I need to store large values that can't be represented by any traditional variable types. Currently I intend to store these values in a bitarray. What's the best way to store these values in a text file on an SD Card?
These are not integer values but rather bit maps; they have no arithmetic meaning. What you are suggesting is simply a byte array of length 360/8, and not related to "large integers" at all. However some more appropriate data structure or representation may be possible.
If the test vector is a single bit in 360, then it is both inefficient and unnecessary to store 360 bits for each vector, a value 0 to 359 is sufficient to unambiguously define each vector. If the correct output is also a single bit, then that could also be stored as a bit index, if not then you could store it as a list of indices for each bit that should be set, with some sentinel value >=360 or <0 to indicate the end of the list. Where most vectors contain less than fewer than 22 set bits, this structure will be more efficient that storing a 45 byte array.
From any bit index value, you can determine the address and mask of the individual wire by:
byte_address = base_address + bit_index / 8 ;
bit_mask = 0x01 << (bit_index % 8) ;
You could either test each of the 360 bits iteratively or generate a 360 bit vector on the fly from the list of bits.
I can see no need for dynamic memory allocation in this, but whether or not it is advisable in an embedded system is largely dependent on the application and target resources. A typical AVR system has very little memory, and dynamic memory allocation carries an overhead for heap management and block alignment that you may not be able to afford. Dynamic memory allocation is not suited in situations where hard real-time deterministic timing is required. And in all cases you should have a well defined strategy or architecture for avoiding memory leak issues (repeatedly allocating memory that never gets released).

Resources