How to remove accents from string in C?

How to remove accents from string in C? - c

Is there is a more efficient way to remove accents from string without make an big array with the characters to replace?
For example:
removeaccents("áèfoo")
Output:
aefoo
In the ASCII table there no accents, I have no idea how to do this. Thanks in advance. :)

Sounds like you're looking for unac(). From the man page:
unac is a C library that removes accents from characters, regardless of the character set (ISO-8859-15, ISO-CELTIC, KOI8-RU...) as long as iconv(3) is able to convert it
into UTF-16 (Unicode).
I couldn't find the download page (I think it's meant to be here, but the link is currently 404ing). If you're on ubuntu, you can get it with:
sudo apt-get install libunac1-dev
Assuming you're using gcc, once it's installed you'll need to add -lunac to your compiler options (to tell the compiler to link with the unac library).

Related

Is there a way to get help for some C functions inside of vim/Neovim?

This question may be a little off topic. But I was wondering if there was a way for me to look at the descriptions of C functions using vim or neovim. Is it possible to look at their documentations by doing something like :help? This would really be helpful since I wouldn't need to lookup to my browser everytime.
I am unclear about these things:
Can :help be my friend here ?
Can I use LSPs to do something like this ?
I am using latest Neovim inside Ubunutu 20.04 in WSL. Is this helpful somehow ?

By pressing K, the keyword under the cursor is looked up using a configured keyword lookup program, the default being man. This works pretty much out of the box for the C standard library.
For C++, you might want to look at something like cppman.

Well yes, you can get the description of C functions by using a LSP (language server plugin)! Here is an image of me using clangd as my LSP:
You'd "just" need to install the LSP and start it. I don't know how familiar you're with neovim, but just in case if you don't know how to install a plugin or to be more specifique: If you don't know how you can install a LSP server, then you can do the following:
There're plenty videos how to set up a LSP-Server for your language. Here's an example.
If you don't want to set up on your own, you can pick up one of the preconfigured neovim setups (some of my friends are recommending lunarvim)
But yeah, that's it. If you have any further questions feel free to ask them in the comments.
Happy vimming c(^-^)c

Let's explain how "K" command works in more detail.
You can run external commands by prefixing them with :! command. So running man tool is as easy as
:!man <C-R><C-W>
Here <C-R><C-W> is a special key combination used to put word under cursor from text buffer down to command line.
Same for showing Vim's built-in help page
:help <C-R><C-W>
As it feels tedious to type that, Vim also defines K Normal mode command that does pretty much the same thing. Except the tool name is taken from value of an option named "keywordprg".
So doing set keywordprg=man (default for *nix systems) makes K to invoke !man tool; while set keywordprg=:help is for bultin help.
Also, the option :h 'keywordprg' is made global or local-to-buffer, so any Vim buffer is able to overwrite global setting. For example, this is already done by standard runtime for "vim" and "help" buffers, so they call ":help" instead of "man".
The problem with :!man command is that it shows "black console". It'd be nice if we could capture man's output and open it inside Vim just like a builtin help page. Then we could also apply some pretty highlighting, assign key macros and all such. This is a pretty common trick and it is already done by a standard plugin shipped with Vim/Neovim.
A command that the plugin provides is called :Man, so you can open :Man man instead of :!man man, for example. The plugin is preactivated in Neovim; for Vim you still need to source one file manually. So to make use of this plugin you'll need something like this
set keywordprg=:Man
if !has("nvim")
source $VIMRUNTIME/ftplugin/man.vim
endif

The previous answer recommending cppman is the way to go. There is no need to install a bulky language server just for the purpose of having the hover functionality. However, make sure you're caching the man pages via cppman -c. Otherwise, there will be a noticeable delay since cppman is fetching the page from cppreference.com on the fly.
If you like popups for displaying documentation, convert the uncompressed man pages (groff -t -e -mandoc -Tascii <man-page> | col -bx), and set keywordprg to your own wrapper to search for keywords according to your needs.

(Android NDK) Strings containing non-ASCII characters get cut off

I'm trying to port a program written in C to Android using the NDK and JNI, and I'm stuck with a ridiculous problem which is driving me crazy.
To make it short, if I do this...
char str[1024];
sprintf(str, "Hellö, this is söme stränge letters.");
...strlen(str) returns 35, as expected. Right?
But if I include a specifier, and do this...
char str[1024];
sprintf(str, "Hellö again. Here's a number: %d", 1);
...strlen(str) returns 4.
Do you see what's happening? It appears the NDK can't (or won't?) accept non-ASCII characters in strings, if I try to format them.
Any time I include an ASCII character >127 in the format string, it just gets cut off. Like it was NULL-terminated.
Is this a bug? Is this expected behaviour?
Ultimately, my question is: What can I do to solve this?
Many thanks in advance.

A "preview" version of Android 5.0 had some issues that were fixed in the final release. See this bug report for more information.
If you get a hex dump of the .o file (with e.g. xxd on Linux) and search for a fragment of the string, you can see how it's encoded in the executable. If it's valid UTF-8 -- I get c3 b6 for 'ö' when I compile with desktop gcc -- then it should work. If it's using some other encoding, the Android libc may reject it as invalid.
If the string in the binary doesn't appear to be UTF-8, check your makefiles for things like -fexec-charset=.

How to make ncurses display UTF-8 chars correctly in C?

I have a program written in C using ncurses. It let user input and display it. It does not display correctly if user input utf8 chars.
I saved the chars user inputed to a file. And I cat this file directly in Shell, it display correctly.
I searched stackoverflow and google, and tried several methods, such as link with ncursesw, display incorrectly.
And I ldd /usr/bin/screen:libncurses.so.5 => /usr/lib64/libncurses.so.5
screen can display what user input correctly.
How to make ncurses display UTF-8 chars correctly ?
What is the general way to display UTF-8 chars in C using ncurses?

You need to have called setlocale(LC_CTYPE, ""); (with a UTF-8 based locale configured) before initializing ncurses. You also need to make sure your ncurses is actually built with wide char support ("ncursesw") but on modern distros this is the default/only build.

#need these as well on top of installation and locate setting
#at least check locale
locale
locale-gen en_US.UTF-8
#vim ~/.bashrc # add 3 lines once ok and fix the profile
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
export NCURSES_NO_UTF8_ACS=1

Retrieving Global Variable Values from Command Line

In one particular project, we're trying to embed version information into shared object files. We'd like to be able to use some standard linux tool to parse the shared object to determine the version for automated testing.
Currently I have "const int plugin_version = 14;". I can use 'nm' and 'objdump' and verify that it's there:
00000000000dcfbc r plugin_version
I can't, however, seem to be able to get the value of that variable easily from command line. I figured there'd be a POSIX tool for showing the initialized values for globals. I have contemplated using a format for the variable as the information itself, ie, plugin_version_14, but that seems like a huge hack. Embedding the information in the filename unfortunately is NOT an option. Any other suggestions welcome.

You could embed it as a string
"MAGIC MARKER STRING VERSION: 4.56 END OF MAGIC" then just look for "MAGIC MARKER STRING" in the file and extract the version information that comes after it.
if you make it a standard, you could easily make command line tool to find these embeded strings on all your software.
if you require it also to be an int, a little macro magic will construct both the int and magic string to make sure they are never out of synch.

There's a couple of options I think.
My first instinct is to make sure the version information lives in its own section in the ELF file. You can use objdump -s -j name of section /bin/whatever.
This rather relies on objdump being available of course.
Alternatively you can do what Keith suggested, and just use 'strings', along with a magical marker string. This feels a little hackish, but should work quite well.
Finally, why don't you just add a --version command line option? You can then store the version information however you like, and trivially retrieve it using the one tool which is certain to be installed on any system which has your software.

A terrible hack that I've used in the past is to embed the version information in a variable name, so nm will show:
00000000000dcfbc r plugin_version_14

Why not writing your own tool to get that version in C/C++ ? You could Use dlopen, then dlsym to get the symbol and print its value to standard output. This way you also verify if the symbol is already there. It looks like 20 ~ 30 lines of code to me and about 20 minutes of your life :)
I know that the question is about command line, but writing such a tool yourself should be easy (especially if such a command line tool does not exist).

If the binary is not stripped, you could use gdb to print the variable. (I just tried to script gdb, but it seems to refuse work if stdin is not a tty, maybe expect will do the job ? )

If you can accept using python, this might help:
import struct
import sys
import subprocess
if __name__ == '__main__':
so = sys.argv[1]
sym = sys.argv[2]
addr = subprocess.check_output('nm %s | grep %s' % (so, sym), shell=True)
addr = int(addr.split()[0], 16)
so_file = open(so)
so_file.seek(addr)
data = so_file.read(4)
print struct.unpack('#i', data)[0]
Disclaimer: This script doesn't do any error checking (if you like it I'm sure you can come up with some ;)). It also assumes you're reading a 4-byte native int value.
$ cat global.c
const int plugin_version = 14;
$ python readsym.py global.so plugin_version
14

Bolding text in console output

For extra credit, the professor wants us to use bolding and/or underlining to text output in the current project.
The example he gave was b\bb o\bo l\bl d\bd is displayed as b o l d
Following that example, I marked up SPACE as
printf("\033[7mS\bSP\bPA\bAC\bCE\E- move forward one page\033[0m");
I'm also implementing reverse video by enclosing strings within \033[7m and \033[0m fields. The reverse video inverts the colors of the line appropriately, but doesn't seem to be affecting the bolding, since both strings with and without the reverse video are not bolding.
Could it be the standard shell used in Ubuntu 10.10 that is at fault?

I agree about using curses, but given your starting point ....
I think you want to use the 'bright' feature of VT100 for the bold, ESC[1m
You can probably find better doc on VT100 codes, but using this page I found the codes. ANSI/VT100 Escape Codes
I hope this helps.

Unless you're just trying to be masochistic, try using curses (or ncurses) instead.
// warning: Going from distant memory here...
curs_attron(A_INVERSE); // maybe A_REVERSE? I don't remember for sure.
curs_addstr("SPACE - move forward one page");
curs_attroff(A_INVERSE);