Reference for how Python handles data? - arrays

I have a list that is <class 'bytes'> that is comprised of a 16-bit PCM value of <class 'int'>. The list is the result of a direct read of a segment of a 16-bit PCM wave file. I then create a numpy array from that built up list to save it as a separate wave file for training but wavfile.write() always fails because the 16-bit PCM data is wrong somehow, such as:
wavfile.write(savepath + 'wave_speechsegment_' + str(wavecnt) + '.wav', sr, nparray.astype(np.int16)) generates a ValueError: invalid literal for int() with base 10: b'z\xfe' error
And trying nparray directly: wavfile.write(savepath + 'wave_speechsegment_' + str(wavecnt) + '.wav', sr, nparray) I get ValueError: Unsupported data type '|S2
I try to set the list as 16-bit PCM values with:
hexval = struct.pack('<BB', val[0], val[1])
waveform.append(hexval)
nparray = np.array(waveform)
but when I save the 16-bit PCM values to the numpy file, python reports:
nparray is type: <class 'numpy.ndarray'> and nparray[0] is: b'z\xfe' and is type: <class 'numpy.bytes_'>
Saving to the numpy array segment to a file produces precisely the data set found for that segment in the source wave file, such as:
7A FE DE FE C5 FF 75 00 2F 01 76 01 99 01 55 01 05 01 74 00 05 00 9D FF 79 FF 65 FF 8C FF C9 FF
Can someone point me to information about how python deals with data, so that I can keep my 16-bit PCM data as 16-bit PCM data?

In [73]: astr = b'z\xfe'
In [74]: type(astr)
Out[74]: bytes
In [75]: len(astr)
Out[75]: 2 # 2 bytes
This is not a list. It's a string, more specifically a byte string, as opposed to the default (for Python 3) unicode string.
An array, created from such as string, will have a S dtype:
In [76]: arr= np.array(astr)
In [77]: arr
Out[77]: array(b'z\xfe', dtype='|S2')
In [78]: arr= np.array(astr+astr+astr) # + joins strings into one
In [79]: arr
Out[79]: array(b'z\xfez\xfez\xfe', dtype='|S6')
The data-buffer of the array contains those bytes. And can be view as other compatible dtypes.
In [87]: arr= np.array([astr+astr+astr])
In [88]: arr
Out[88]: array([b'z\xfez\xfez\xfe'], dtype='|S6')
In [89]: arr.view('S1')
Out[89]: array([b'z', b'\xfe', b'z', b'\xfe', b'z', b'\xfe'], dtype='|S1')
In [94]: arr.view('int16')
Out[94]: array([-390, -390, -390], dtype=int16)
In [95]: arr.view('uint16')
Out[95]: array([65146, 65146, 65146], dtype=uint16)
In [98]: arr.view('>i2')
Out[98]: array([31486, 31486, 31486], dtype=int16)

Related

How do I read song meta information from an m4a file?

I have code that reads ID3 tags from an mp3 file, but now I have some m4a files. I found some info on the structure of these files, but that doesn't mention ID3 tags.
What's the best resource for m4a file structure?
Is the song metadata in the m4a structure, or in the contained audio file (which appears to be AAC)?
M4A is just a filename extension - it still remains an MP4 container. Which consists of atoms/boxes (not chunks). The best resource is usually the documentation of the vendor himself, followed by experts with long experience, followed by additional details, followed by simplified explanations:
Apple: QuickTime File Format Specification; Metadata
ExifTool: QuickTime ItemList Tags
Multimedia.cx: QuickTime container; § 2.6: Meta data
xhelmboyx: MP4 layout = ISO 14496-1 Media Format
Just Solve the File Format Problem:
MP4, which uses the...
ISO Base Media File Format, which grew from...
QuickTime, which uses the...
Boxes/atoms format
Strictly by standard only the (MP4) container should have the overall metadata and any of the streams inside should not be searched for metadata. However, don't rely on this, and don't ignore potentially valuable metadata that can be in any/all of the streams (video, audio, subtitles, pictures...). Containers are like archives: they contain one or more files - and for each file you're back at where you began, because you have to recursively analyze that file again. AAC is by far not the only possible audio stream/codec - you could also run across an MP3 in an MP4 container.
ID3 can occur in MP4 as atom ID32, as mentioned here, but this is rare and only allows version 2.x, not version 1.
Additionally to the formats own metadata atoms other metadata formats (not specifically aiming at music) can be embedded in the following atoms:
system
atom UUID with value
other atoms
XMP
0xBE 7A CF CB 97 A9 42 E8 9C 71 99 94 91 E3 AF AC
XMP_ or xml or xmlc
Exif
0x05 37 cd ab 9d 0c 44 31 a7 2a fa 56 1f 2a 11 3e or JpgTiffExif->JP2
exif or exfc
IPTC
0x33 c7 a4 d2 b8 1d 47 23 a0 ba f1 a3 e0 97 ad 38 or 0x09 a1 4e 97 c0 b4 42 e0 be bf 36 df 6f 0c e3 6f
8BIM
0x2c 4c 01 00 85 04 40 b9 a0 3e 56 21 48 d6 df eb
360fly
0xef e1 58 9a bb 77 49 ef 80 95 27 75 9e b1 dc 6f
ID3 v2.x
ID32
Mostly the atoms in an MP4 have this layout:
- ftyp
- free
- mdat
+ moov
- mvhd
+ udta
- cprt
+ trak
- tkhd
+ udta
- cprt
+ edts
- elst
+ mdia
- mdhd
- hdlr
+ minf
- smhd
- hdlr
+ dinf
- dref
+ stbl
- stsd
- stts
- stsc
- stsz
- stco
+ meta
- hdlr (mdta)
- mhdr
+ keys
- mdta
- mdta
- mdta...
> ilst
+ (size, index)
- data (type, locale, value)
- itif
- name
- udta
> ctry
> lang
+ trak
https://docs.fileformat.com/audio/m4a/ has some details
https://github.com/ahyattdev/M4ATools has example code
The song meta data is in nested m4a chunks.

What is the best way to write a Longs array into a HEX file in VBA?

I have an array of Longs (in the range of 10k or maybe 100k long) that I have to place into a HEX file (.wav) as 8-bit little-endians. What is the best way to do this?
I used a simple PUT, but it didn't go as planned. I tried some test values that should yield
DE A8
CC 16
00 00
1E 5B
And I got instead
DE A8 FF FF CC 16 00 00 00 00 00 00 1E 5B 00 00
The code I used is below. Do you have any idea what happenned and how to fix this? In every value I tried, there's always an extra 00 00 or FF FF between records.
Sub Gera_sinal()
Dim sinal() As Long
ReDim sinal(3)
'Test values
sinal(0) = -22306
sinal(1) = 5836
sinal(2) = 0
sinal(3) = 23326
'Creates a file and puts the values in it
Dim n_arq As Integer
Dim path As String
path = "C:\Users\DELL\Desktop\App\WAVs\Sinal_VBA.wav"
Set fs = CreateObject("Scripting.FileSystemObject")
Set a = fs.CreateTextFile(path, True)
a.Close
n_arq = FreeFile
Open path For Binary As n_arq
Put n_arq, , sinal
Close n_arq
End sub
The storage size for the data type Long is 4 bytes. Therefore, since you've declared your array as Long, each value within the array will be stored in the file as 4 bytes.
So, for example, as you know, -22,306 converts to A8 DE in Hex. However, since the value is going to be stored as 4 bytes, the actual value stored in the file will be FF FF A8 DE, or DE A8 FF FF in little-endian.
Now, if the range of values is going to be between -32,768 and 32,767, inclusive, you can declare your array as Integer instead. This will mean that each value will be stored in the file as 2 bytes. So -22,306 will be stored as DE A8.
By the way, you can avoid using the FileSystemObject object in order to truncate an existing file to 0 bytes. Instead, you can use the Kill method to delete the file, when it already exists, since the Open statement in Binary mode will create it when one doesn't already exist.
Accordingly, your macro can be re-written as follows...
Sub Gera_sinal()
Dim sinal() As Integer
ReDim sinal(3)
'Test values
sinal(0) = -22306
sinal(1) = 5836
sinal(2) = 0
sinal(3) = 23326
'Creates a file and puts the values in it
Dim n_arq As Integer
Dim path As String
path = "C:\Users\DELL\Desktop\App\WAVs\Sinal_VBA.wav"
'delete file, if it already exists
On Error Resume Next
Kill path
On Error GoTo 0
n_arq = FreeFile
Open path For Binary Access Write As n_arq
Put n_arq, , sinal
Close n_arq
End Sub

Detecting I-frame data in an MPEG-4 transport stream

I am testing a project. I need to break the payload data(making zero some bytes) of the MPEG-4 ts packets by a percentage coming from the user. I am doing it by reading the ".ts" file packet by packet(188 bytes). But the video is changing to really mud after process. (By the way I'm writing the program in C)
So I decided to find the data/packets that belongs to I-frames, then not touching them but scrambling the other datas by percentage. I could find below
(in hex)
00 00 00 01 E0 start of video PES packet
..
..
00 00 01 B8 start of group of pictures header
..
..
00 00 01 00 the picture start code. This is 32 bits. The 10 bits immediately following this is called as the temporal reference. So temporal reference will include the byte following the picture start code and the first two bits of the second byte after the picture start code ie one byte(8 bits) + 2 bits. These we need to skip. Now the three bits present(3, 4 and 5th bits of the second byte from the picture start code) will indicate the Frame type ie I, B or P. So to get this simply logical AND & the second byte from the picture start code with 0x38 and right shift >> with 3.
For example the data is like that;
00 00 01 00 00 0F FF F8 00 00 01 B5........... and so on.
Here the first four bytes 00 00 01 00 is the picture start code.
The fifth byte and the first two bits of the sixth byte is the temporal reference.
So our concern is in the sixth byte --> 0F
((0F & 38)>>3)
Frame type = 1 ==> I Frame
Frame type 000 forbidden
Frame type 001 intra-coded (I) - iframe
Frame type 010 predictive-coded (P) - p frame
Frame type 011 bidirectionally-predictive-coded (B) - b frame
But this is for MPEG-2. Is there some patterns like that so I recognize and get the frame type with bitwise operations for MPEG-4 transport stream(extension is ".ts")?
And I need to get how many bytes or packets belong to that frame?
Thanks a lot for your help
I would parse the complete TS packet. So first determine what PID your video stream belongs to (by parsing the PAT and PMT). Then find keyframes by looking for the 'Random Access indicator' bit in the Adaptation Field.
uint8_t *pkt = <your 188 byte TS packet>;
assert( 0x47 == pkt[0] );
int16_t pid = ( ( pkt[1] & 0x1F) << 8 ) | pkt[2];
if ( pid == video_pid ) {
// found video stream
if( ( pkt[3] & 0x20 ) && ( pkt[4] > 0 ) ) {
// have AF
if ( pkt[5] & 0x40 ) {
// found keyframe
} } }
If you are using H.264 there should be specific byte stream for I and P frame ..
Like 0x0000000165 for I frame and 0x00000001XX for P frame ..
So just parse and look for continuous such byte stream in such a way you can identify I or P frame..
Again above byte stream is codec implementation dependent ..
For more information you can look into FFMPEG..

Is there a pattern in these bitshifts?

I have some Nikon raw files (.nef) which were rendered useless during a USB transfer. However, the size seems fine and only a handful of bits are shifted - by a value of -0x20 (hex) or -32 (dec).
Some of the files could be recovered later with another Computer from the same Card and now I am searching for a solution to recover the other >100 files, which have the same error.
Is there a regular pattern? The offsets seem to be in intervals of 0x800 (2048 in dec).
Differences between the two files
1. /_LXA9414.dump: 13.703.892 bytes
2. /_LXA9414_broken.dump: 13.703.892 bytes
Offsets: hexadec.
84C00: 23 03
13CC00: B1 91
2FA400: 72 52
370400: 25 05
4B9400: AE 8E
641400: 36 16
701400: FC DC
75B400: 27 07
925400: BE 9E
A04C00: A8 88
AC2400: 2F 0F
11 difference(s) found.
Here are more diffs from other files:
http://pastebin.com/9uB3Hx43

Convert unknown Hex digits to a Longitude and Latitude

F3 c8 42 14 - latitude //05.13637° should be nearby this coordinate
5d a4 40 b2 - longitude //100.47629° should be nearby this coordinate
this is the hex data i get from GPS device, how to convert to readable coordinate?
i don't have any manual document.please help.thanks
22 00 08 00 c3 80 00 20 00 dc f3 c8 42 14 5d a4 40 b2 74 5d 34 4e 52 30 39
47 30 35 31 36 34 00 00 00
this is my full bytes i received,but the engineer told me that F3 c8 42 14 is latitude and 5d a4 40 b2 is longitude
I worked with a Motorola GPS module once and the documentation said that the two hexes represented int types.
In your case, you might want to look at the documentation as well. If you know the model number, you can just google it.
Here is the documentation link for the motorola GPS I used.
Motorola GPS Module
I also took the liberty to do some calculations for you. If your lattitude was indeed
0x1442c8f3
(endianness does make a difference here). The integer equivalent is
339921139
in decimal system. If you divide that by 3600000 milliarcseconds
(where 1 deg = 60 min = 60 * 60 s = 60*60*1000 ms) you get
94.4225386
deg, which is close to your expectations. There isn't enough data to validate it but I believe most of the GPS modules return the milliarcseconds for both latitude and longitude.)
Assuming the hex codes represent unencrypted 32-bit floating point numbers (they might not do), you could try reading them into a C program and printing them out using printf("%f").
Don't forget that the words could have both endianness, i.e. the first one could be F3 C8 42 14 or 14 42 C8 F3 (bytes reversed).
Try it both ways and see if you get anything useful.
I wasn't able to get anything quickly from this online floating point calculator here.
Edit:
Building on Khanal's answer, this link to Latitude/Longitude suggests that the numbers are indeed fixed point and explains the sign convention.
Perhaps more useful for the calculations is HexIt, which allows choosing from a variety of C data types, both integer and floating point, as well as flipping back and forth between little and big endian representations.
I think the values are in 32-bit floating point. However, the bytes are slightly shifted in the stream that you show. Taking longitude first: 100.47629 in 32-bit floating point is 42C8F3DC these are bytes 10 through 13 in your stream (Least significant byte first).
For latitude 5.13637 in 32-bit floating point is 40A45D24 these are bytes 14 through 17 but it's 40A45D14 in the byte stream so it's off a little in the least significant decimal digit (Again, it's least significant byte first).

Resources