How to decompress a warc.zst file? - archive

I am trying to decompress a WARC ZST file that I downloaded from here: https://archive.org/details/archiveteam_yahooanswers_20210422220546_c4fac540
I tried the command zstd -d yahooanswers_20210422220546_c4fac540.1619026173.megawarc.warc.zst but I got this error:
73.megawarc.warc.zst : 0 MB... 73.megawarc.warc.zst : Decoding error (36) : Dictionary mismatch
How can I find the said dictionary or are there any alternatives to this?

The dictionary can be found inside the first skippable frame of the warc.
To extract the dictionary OrIdow6 write this to extract it: https://transfer.notkiska.pw/inline/TXlRo/xtract.py
You'll require python3, zstd and zstandard
python ./xtract.py /path/to/megawarc.warc.zst > dict
Then you can
zstd -d /path/to/megawarc.warc.zst -D dict
And you should be able to view the megawarc with your standard warc viewing tools

Related

SWUpdate on RPi4 via yocto - error parsing configuration file

After booting SWUpdate yocto-generated image for the first time, executing swupdate results in error message:
Error parsing configuration file: 'globals' section missing, exiting.
I tried to strictly follow SWUpdate's documentation, but it gets short when it comes to yocto integration. I'm using meta-swupdate, meta-swupdate-boards, and meta-openembedded layers together with poky example repository all at Kirkstone tag, building via bitbake update-image and having modyfied local.conf as:
MACHINE ??= "raspberrypi4-64"
ENABLE_UART = "1"
RPI_USE_U_BOOT = "1"
IMAGE_FSTYPES = "wic ext4.gz"
PREFERRED_PROVIDER_u-boot-fw-utils = "libubootenv"
IMAGE_INSTALL:append = " swupdate"
Is there anything else I need to modify to generate the configuration file and be able to run SWUpdate binary properly?
Side question: In the documentation, it's recommended to append swupdate-www to achieve a better web server. However, if I append it, there is no swupdate-www binary inside the `/usr/bin' directory.
As with other recipes folders the recipes-support/swupdate/swupdate/raspberrypi4-64 folder was missing inside the meta-swupdate-boards layer. Therefore, an empty config file was always generated. After adding this folder and all related files, strongly inspired by raspberrypi3 folder, the error was gone and swupdate -h provided the expected output.
There was also one new error during build process thrown by yocto. It was related to missing systemd requirement and was solved by adding:
DISTRO_FEATURES_append = " systemd"
to local.conf

Create Yocto Filesystem of type rootfs.img

I'm trying to create filesystem of extension rootfs.img from Yocto. Adding IMAGE_FSTYPE="img" is failing, saying img is not recognized because its definition is not defined in any meta class.
I have looked into using wic, but cant find the command that should go in .wks file
Any ways in which I can create rootfs.img (instead of rootfs.tar.gz or rootfs.ext4) in Yocto?
Tried wic & IMAGE_FSTYPES="img"
Solved it this way :
Added function oe_mkimgfs() similar to https://git.yoctoproject.org/poky/tree/meta/classes/image_types.bbclass#n63
Modified mkfs cmd to mkfs.$fstype -F $extra_imagecmd ${IMGDEPLOYDIR}/${IMAGE_NAME}${IMAGE_NAME_SUFFIX}.img -d ${IMAGE_ROOTFS}
Also added other macros :
EXTRA_IMAGECMD_img ?= "-i 4096"
do_image_img[depends] += "e2fsprogs-native:do_populate_sysroot"
RUNNABLE_IMAGE_TYPES ?= "ext2 ext3 ext4 img"
IMAGE_CMD_img = "oe_mkimgfs ext4 ${EXTRA_IMAGECMD}"

Alpine APK Package Repositories, how are the checksums calculated?

I'm trying to work out how the pull checksum for packages is calculated within Alpine APK package repositories. The documentation regarding the format is lacking in any detail.
When I run apk index -o APKINDEX.unsigned.tar.gz *.apk which generates the repository. When you extract the txt file from inside the gz, it contains the following...
C:Q17KXT6xFVWz4EZDIbkcvXQ/uz9ys=
P:redis-server
V:3.2.3-0
A:noarch
S:2784844
I:102400
T:An advanced key-value store
U:http://redis.io/
L:
D:linux-headers
I'm interested in how the very first line is generated. I've tried to read the actual source that's used to generate this, but I'm not a C programmer, so it's hard for me to comprehend as it jumps all over the place.
The two files mentioned in the documentation are database.c and package.c.
Incase this somewhat helps, the original APK file has these various hashes...
CRC32 = ac17ea88
MD5 = a035ecf940a67a6572ff40afad4f396a
SHA1 = eca5d3eb11555b3e0464321b91cbd743fbb3f72b
SHA256 = 24bc1f03409b0856d84758d6d44b2f04737bbc260815c525581258a5b4bf6df4
The pull checksum is the sha1sum of the second tar.gz file in the apk file, containing the .PKGINFO file.
The Alpine APK package is actually a concatenation in disguise of 3 tar.gz files.
We can split the package below using gunzip-split into 3 .gz files, then rename them to .tar.gz
./gunzip-split -s -o ./out/ strace-5.14-r0.apk
mv ./out/file_1.gz ./out/file_1.tar.gz
mv ./out/file_2.gz ./out/file_2.tar.gz
mv ./out/file_3.gz ./out/file_3.tar.gz
sha1sum ./out/file_2.tar.gz
7a266425df7bfd7ce9a42c71a015ea2ae5715838 out/file_2.tar.gz
tar tvf out/file_2.tar.gz
-rw-r--r-- root/root 702 2021-09-03 01:34 .PKGINFO
In the case of the strace package the checksum value can be derived as above:
apk index strace-5.14-r0.apk -o APKINDEX.tar.gz
tar xvf APKINDEX.tar.gz
cat APKINDEX
echo eiZkJd97/XzppCxxoBXqKuVxWDg=|base64 -d|xxd
00000000: 7a26 6425 df7b fd7c e9a4 2c71 a015 ea2a z&d%.{.|..,q...*
00000010: e571 5838 .qX8
When comparing them we see that they match.
References
https://github.com/martencassel/apk-tools/blob/master/README.md
https://gitlab.com/cg909/gunzip-split/-/releases
https://lists.alpinelinux.org/~alpine/devel/%3C257B6969-21FD-4D51-A8EC-95CB95CEF365%40ferrisellis.com%3E#%3C20180309152107.472e4144#vostro.util.wtbts.net%3E
So...
/* Internal cointainer for MD5 or SHA1 */
struct apk_checksum {
unsigned char data[20];
unsigned char type;
};
Basically take the C: value then chop off the Q from the front then base 64 decode. Chop off the last value (type which defaults to SHA1) then you have your sha1. This appears to be made of the CONTENTS of the package but that would take further looking into it.
You need to look here: https://git.alpinelinux.org/cgit/apk-tools/tree/src/blob.c#n492
It is apk_blob_pull_csum
First 'Q' stands for encoding
Next '1' stands for SHA1
Looks like this checksum is made database.c in apk_db_unpack_pkg:
apk_sign_ctx_init(&ctx.sctx, APK_SIGN_VERIFY_IDENTITY, &pkg->csum, db->keys_fd);
tar = apk_bstream_gunzip_mpart(bs, apk_sign_ctx_mpart_cb, &ctx.sctx);
r = apk_tar_parse(tar, apk_db_install_archive_entry, &ctx, TRUE, &db->id_cache);
but I'm not sure, because I failed to trace this code.
It is really not easy to understand what are they doing.

How to use Axis2c to generate C files from WSDL file

I want to use a webservice in C code. I am trying to make a client. I need something to do what Axis2java does and generates the classes from a wsdl files.
I found that Axis2c makes (.c) files generated from wsdl file.
I downloaded it from here . unzipped it. I created the environment variable for AXIS2C_HOME and I created AXIS2C_CLASSPATH.
but I can't make it work.
when I type this command :
WSDL2C -uri -ss -sd -d none -u -f -o
I get this error :
echo off
Error: Could not find or load main class org.apache.axis2.wsdl.WSDL2C
how can I solve this problem. and please tell me how to use this Axis2c tool properly.
Thank you in advance.
#loentar : I installed Axis2/Java and I set the environment variable for it. now I run the wsdl2c.bat I get this :
E:\dev\Tools\axis2c-bin-1.6.0-win32\bin\tools\wsdl2c>WSDL2C.bat
E:\dev\Tools\axis2c-bin-1.6.0-win32\bin\tools\wsdl2c>echo off
Usage: java [-options] class [args...]
(to execute a class)
or java [-options] -jar jarfile [args...]
(to execute a jar file)
where options include:
-d32 use a 32-bit data model if available
-d64 use a 64-bit data model if available
-server to select the "server" VM
The default VM is server.
-cp
-classpath
A ; separated list of directories, JAR archives,
and ZIP archives to search for class files.
-D=
set a system property
-verbose:[class|gc|jni]
enable verbose output
-version print product version and exit
-version:
require the specified version to run
-showversion print product version and continue
-jre-restrict-search | -no-jre-restrict-search
include/exclude user private JREs in the version search
-? -help print this help message
-X print help on non-standard options
-ea[:...|:]
-enableassertions[:...|:]
enable assertions with specified granularity
-da[:...|:]
-disableassertions[:...|:]
disable assertions with specified granularity
-esa | -enablesystemassertions
enable system assertions
-dsa | -disablesystemassertions
disable system assertions
-agentlib:[=]
load native agent library , e.g. -agentlib:hprof
see also, -agentlib:jdwp=help and -agentlib:hprof=help
-agentpath:[=]
load native agent library by full pathname
-javaagent:[=]
load Java programming language agent, see java.lang.instrument
-splash:
show splash screen with specified image
See http://www.oracle.com/technetwork/java/javase/documentation/index.html for m
ore details.
after that I run this command :
E:\dev\Tools\axis2c-bin-1.6.0-win32\bin\tools\wsdl2c>WSDL2C.bat -uri hello.wsdl
-u -uw
E:\dev\Tools\axis2c-bin-1.6.0-win32\bin\tools\wsdl2c>echo off
Unrecognized option: -uri
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
what can I do ?
I'm using windows 8 by the way.
In addition to Axis2/C you must have Axis2/Java installed.
AXIS2_HOME must point to Axis2/Java installation.
For details please see README of codegen.
The complete list of commands to create and compile client is:
# create stubs
sh $AXIS2C_HOME/bin/tools/wsdl2c/WSDL2C.sh -uri Calculator.wsdl -u -uw
# implement main() in src/your_client.c
# see samples/codegen/client/calculator for example
# compile and link client
gcc -o calculator_client src/*.c -I$AXIS2C_HOME/include/axis2-1.6.0 -L$AXIS2C_HOME/lib -laxutil -laxis2_axiom -laxis2_parser -laxis2_engine -lpthread -laxis2_http_sender -laxis2_http_receiver -ldl -Wl,--rpath -Wl,$AXIS2C_HOME/lib
I set the envinroment variable for JAVA_HOME, AXIS2_HOME, AXIS2C_HOME, and added their lib folder to CLASSPATH. after running this command:
WSDL2C.bat -uri hello.wsdl -u -uw
I got this message:
echo off
Error: Could not find or load main class org.apache.axis2.wsdl.WSDL2C
I found the solution myself. :)
I double checked if I had created the environment variable for AXIS2_HOME and I saw that it is there, correctly.
in spite of it's existence I tried to set it again in command prompt. so I typed:
SET AXIS2_HOME=E:\dev\Tools\axis2-1.6.2
then I typed the command for WSDL2C code generator:
WSDL2C.bat -uri hello.wsdl -u -uw
And BAM ! it worked properly.
Now I can generate C files from WSDL file.

reprepro complains about the generated pbuilder debian.tar.gz archive md5

I have configured a private APT repository (using resources on internet like http://inodes.org/2009/09/14/building-a-private-ppa-on-ubuntu/) and I'm uploading for the first time my package containing the sources of my C++ application.
So reprepro repository is empty.
I use the following command in order to start the build:
sudo reprepro -V -b /srv/reprepro processincoming incoming
Then the build start, a lot of output is genearated and I can see that pbuilder is compiling the project source code and everything is fine. I can even find in the result/ folder debian packages etc...
But the build failed with a POST_BUILD_FAILED because it seems that pbuilder has changed the douane-testing_0.8.1-apt1.debian.tar.gz file and the md5 sum is now different as shown here:
File "pool/main/d/douane-testing/douane-testing_0.8.1-apt1.debian.tar.gz" is already registered with different checksums!
md5 expected: 97257ae2c5790b84ed7bb1b412f1d518, got: df78f88b97cadc10bc0a73bf86442838
sha1 expected: ae93c44593e821696f72bee4d91ce4b6f261e529, got: d6f910ca5707ec92cb71601a4f4c72db0e5f18d9
sha256 expected: c3fac5ed112f89a8ed8d4137b34f173990d8a4b82b6212d1e0ada1cddc869b0e, got: ebdcc9ead44ea0dd99f2dc87decffcc5e3efaee64a8f62f54aec556ac19d579c
size expected: 2334, got: 2344
There have been errors!
I don't understand why it is failing as when I compare the 2 packages (having those md5 sums) the content is strictly the same (I used a diff tool but no differences and no new or removed files).
The only thing I can see is that the archive from pbuild is bigger of 10 Bytes than the orginal one I have uploaded:
On my development machine, the file with the md5 97257ae2c5790b84ed7bb1b412f1d518 :
-rw-r--r-- 1 zedtux zedtux 2334 Feb 3 23:38 douane-testing_0.8.1-apt1.debian.tar.gz
On my server, the file with the md5 df78f88b97cadc10bc0a73bf86442838 :
-rw-r--r-- 1 root root 2344 Feb 5 00:58 douane-testing_0.8.1-apt1.debian.tar.gz
I have pbuild version 0.213 on my server.
What could be the reason of this behavior and how can I fix it ?
Edit
I'm suspecting an issue with the GPG key which looks missing and then files aren't signed so md5sum is different.
During the build process I have the following lines:
I: Extracting source
gpgv: Signature made Wed Feb 5 22:04:37 2014 UTC using RSA key ID 9474CF36
gpgv: Can't check signature: public key not found
dpkg-source: warning: failed to verify signature on ./douane-testing_0.8.1-apt1.dsc
Edit 2
I have tried to find the command to create manually the .debian.tar.gz file.
The best I've found is the following:
tar cv debian | gzip --no-name --rsyncable -9 > douane-testing_0.8.1-apt1.debian.tar.gz
I don't get the same result than dpkg-source but I tried the same command on my server (I should at least have the same size) but it's not matching...
Could it be that Debian and Ubuntu aren't compressing the same way ?
Finally after some evenings of research I found the solution on launchpad.net !
Found the solution. By default pbuilder calls dpkg-buildpackage like so:
DEBBUILDOPTS="$DEBBUILDOPTS -rfakeroot"
dpkg-buildpackage -us -uc $DEBBUILDOPTS
That causes dpkg-buildpackage to rebuild the diff.gz and .dsc files. Add a -b in there, and it won't. It also means the resulting .changes file will only reference the .deb file. Which is what you want, I think.
The easy solution is to add a line to your .pbuilderrc:
DEBBUILDOPTS="-b"
My previous answer is alright but is not complete.
Then I had the issue that reprepro complains about the source tarball (.orig.tar.xz).
But it was normal as I wasn't doing the packages correctly.
I have written a bash script which I'm executing in VM for each Ubuntu series.
This script was always doing everything from scratch, and was using dh_make --createorig argument and here is the issue.
The correct way is to generate once (for example on Ubuntu precise) and then re-use the .orig.tar.xz file and no more use the --createorig argument of dh_make.
I hope this could help someone :-)

Resources