Preserving all capitalization in BibTeX [closed] - bibtex

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a huge .bib file generated automatically from Papers for Mac and all the capitalization in the .bib is already the way I want it, but it doesn't have {} brackets on word like RNA.
Is there a way to force BibTeX to keep the capitalization rather than change some words to lowercase?

I agree with Killian that the right thing is to put {}s to conserve capitalisation, but I don't recommend doing this always, since the behaviour is wrong in some contexts, and not automatisable, but instead the right thing with Bibtex is to do the following:
Put book and article titles into title case (i.e., capitalising all significant words [1], but not protecting them yet);
Protect the capitals of all proper names, e.g., From {B}rouwer to {H}ilbert;
Protect the capitals of all technical acronyms, e.g., The definition of {S}tandard {ML}; and
Protect the initial word of a subtitle, e.g. the {W}ittgenstein's Poker: {T}he story of a ten-minute argument.
Don't protect lowercase letters: this prevents Bibtex from converting the string to all-caps, which is required by some obscure bibliographical styles.
If you have been using a spell-checker, then the contents of its database will, with luck, contain nearly all of the material you need to know to capitalise properly: spell-checker's store information on which words are all-caps, and which are capitalised as proper names. If you can programmatically match words against this, then you can generate your Bibtex database automatically, with more than a little work, but it's maybe a two-hour project.
Tiresomely, Bibtex can't be used to get all bibliographies right, since different citation styles actually have different lists of non-significant words. However, in practice hardly anyone ever cares about the differences, so one can come up with a standard list of non-capitalised words.
[1] - Significant words:"a", all two-letter actual words, "the", "and", "some", all one-word prepositions, and all one-word pronouns would be an acceptable list of non-significant words , I think, to nearly all publishers.

If you prefer to edit the bibtex style (.bst) rather than the bibliography (.bib), you can search for occurences of change.case$ in it. This is the function that capitalizes or title-izes fields that are not people names.
Typically, for the title field, you should find something like title "t" change.case$. Since you want the title unmodified, replace that by just title.

In that case you should just add {} around each entire title, which has the same effect and should be easy to do automatically.

I was getting the same issue with a title such as:
title = {blah blah AB blah AB blah}
turning out as:
"blah blah ab blah ab blah"
Using Charles Stewart's suggestion, I changed my title to:
title = {blah blah {A}{B} blah {A}{B} blah}
Now my title turns out right:
blah blah AB blah AB blah
Hope this helps.

One alternative to using {curly brackets} is this:-
Check your root folder for .bbl file, where .bbl is your BiBteX database, after you run pdflatex for the first time and then run bibtex on your BiBteX database file.bbl.
Open this *.bbl file in an editor of your choice.
The file would look like this:
\begin{thebibliography}{10}
\expandafter\ifx\csname url\endcsname
\relax
\def\url#1{\texttt{#1}}
\fi
\expandafter\ifx\csname urlprefix\endcsname
\relax\def\urlprefix{URL }
\fi
\bibitem{label}.....
Edit this *.bbl file to meet your requirements and now run the pdflatex command on your .tex file. This should give you the desired result.
By this method you can edit the bibliography in any manner. You can even add names with accented characters.

Related

Is there any way to assign properties to differentiate values which would otherwise be considered the same? (in C language) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
Let me start by saying that I'm a beginner programmer that knows very basic C
The exact code I'm working on doesn't matter here its just that I've run into a very odd problem (I'm messing around with ASCII graphics)
I want to differentiate between two 'O' characters. I could just use different characters but I want it to look good. I tried using Unicode to differentiate them (since Latin O and Greek Omicron look the same) but it won't work if I send it to someone else (since my output is on command prompt) edit: won't work if I send it to someone since command prompt must be manually configured to allow unicode characters (atleast to my understanding)
I don't know if using structures would work and I want to write this in C and not any other OOP language like python (because I'm practicing C)
So is there any logic I could use to do this?
EDIT: I want to apply different behaviours to both O and there will be multiple O of each type on screen (ik this is what objects do but pls help me with a logic for C)
EDIT 2: Apparently my question is very vague so I'll elaborate
I'm having a 2d array that I'm using as a game screen
I want O to come in from each side (top bottom right left), scroll across the screen and exit from the other side
I already came up with the logic to make it work for a single side but the problem comes in when I want to make it work for all sides
My current logic can't differentiate between an O that has to go left and and O that has to go down
Hence why I want to different between two instances of O
(I haven't posted my code since I just need the logic and want to solve the actual code by myself and also it's probably very inefficient or convuluted etc)
tldr: I want to differentiate between "O" and "O"
My current logic can't differentiate between an O that has to go left and and O that has to go down
Your problem is that you don't differentiate internal representation and external presentation.
You have two distinct objects that move across the screen. They both look like an O character. This doesn't mean your program should store them both as 'O', some variation of 'O' such as the omicron character. The program can store e.g. numbers 1 and 2, or any two different objects that are convenient to work with, and display an O in place of either.

regex: extract text between two string with text that match a specific word

I'm refactorying a very big C project and I need to find out some part of code written by specific programmer.
Fortunately every guy involved in this project mark his own code using his email address in standard C style comments.
Ok, someone could say that this could be achieved easily with a grep from command line, but this is not my goal: I may need to remove this comments or substitute them with other text so regex is the only solution.
Ex.
/*********************************************
*
* ... some text ....
*
* author: user#domain.com
*
*********************************************/
From this post I found the right expression to search for C style comments which is:
\/\*(\*(?!\/)|[^*])*\*\/
But that is not enough! I only need the comments which contains a specific email address. Fortunately the domain of email address I'm looking for seems to be unique in the whole project so this could make it simpler.
I think I must use some positive lookahead assertion, I've tried this one:
(\/\*)(\*(?!\/)|[^*](?=.*domain.com))*(\*\/)
but it doesn't run!
Any advice?
You can use
\/\*[^*]*(?:\*(?!\/)[^*]*)*#domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/
See the regex demo
Pattern details:
/\* - comment start
[^*]*(?:\*(?!\/)[^*]*)* - everything but */
#domain\.com - literal domain.com
[^*]*(?:\*(?!\/)[^*]*)* - everything but */
\*\/ - comment end
A faster alternative (as the first part will be looking for everything but the comment end and the word #domain):
\/\*[^*#]*(?:\*(?!\/)[^*#]*|#(?!domain\.com)[^*#]*)*#domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/
See another demo
In these patterns, I used an unrolled construct for (\*(?!\/)|[^*])*: [^*]*(?:\*(?!\/)[^*]*)*. Unrolling helps construct more efficient patterns.

How to indicate an word exception for stemming in Hunspell

I am using Hunspell to stem words for a SOLR instance. For the most part, it seems to be working well.
I'm using the OpenOffice dic/aff files.
However, there are some notable word exceptions, and I'd like to be able to remove these as candidates for stemming.
A great example is "skier", which stems to "sky" because of the following:
in the .dic file
sky/MDRSGZ
relevant rule in the .aff file
SFX R y ier [^aeiou]y
Is there any way to indicate that skier and only skier should be left alone?
Yeah this is a very common thing, just remove the "R"
sky/MDSGZ
But you may then want to add back in on another line "skier" and any other versions of it.
skier/MS
I have had to make numerous changes to this file, and now really wish there was a better option.
For example
Butter -> Butt
Corner -> Corn
Easter -> East
And then another one that is really confusing,
Wind == Wound
On my site before we fixed it if you searched for wind like in "wind power" you ended up with a bunch of bruises and bloody wounds.
Because "wound" like in "I wound the clock" stemmed to wind.
We also decided to remove all RE prefixes. because things like
remarkable -> mark
remove -> move
reset -> set
restore -> store
So if you know of a better dictionary that is better for this please let me know. (I think the main problem is this dictionary is more intended for spell check then for stemming)
I would be willing to start and/or contribute to a git project for a real stemming dictionary to replace this spelling dictionary for everyone out there using this.
have you tried freeling? It is open sourced.
A demo page is here:
http://nlp.lsi.upc.edu/freeling/demo/demo.php
When I pick english, pos tagging I get the following result:
you wound the clock?
you wind the clock?
PRP VBD DT NN ?
also skier, wind power all get the noun stems. It is a great stemmer and analyzer.
not sure about licensing. the download page:
http://devel.cpl.upc.edu/freeling/downloads?order=time&desc=1

Regular expression to replace (if|then)

I have some verse references in articles that I want to link to the adjacent verses file.
Example:
some text (Gen 2:15, 16), other text (Ex 4:12, 13) more.. etc.
I could replace the first one with the following regex:
\(Gen \1: \2, \3\)
Here I fixed the "1" (book=) and the "Gen"
But I couldn't figure out how to use if|then so that I could give it all arrays of (Gen|Ex|Lev.. etc.), so that it replaces Gen with book number "1", Ex "2".. etc.
You need to somewhere define what all the book orders are. And you'll need to use some sort of scripting language, not just a plain old regex. For example, you could do something along the lines of:
books = ["Gen", "Ex", ..., "Rev"]
...and then replace book_name with books.index(book_name)+1
The exact code/syntax obviously depends on which language you choose to use.
With notepad++ you won't be able to get the order numbers.
But everything else is possible. You need to put each book on a new line:
find \), and replace by \n
Then use this pattern:
[a-z\s]+\(([a-z]+)\s+([0-9:]+)\,\s+([0-9]+)\)
and replace by:
\1: \2, \3
you'll get the list of urls. Which then you can merge back to one line if needed.
The only problem is the book number.
Demo is here: https://regex101.com/r/qN8mO7/2

Calling functions from plain text descriptions

I have an app which has common maths functions behind the scenes:
add(x, y)
multiply(x, y)
square(x)
The interface is a simple google- style text field. I want the user to be able to enter a plain text description -
'2*3'
'2 times 3'
'multiply 2 and 3'
'take the product of 2 and 3'
and get a answer mathematical answer
Question is, how should I map the text descriptions to the functions ? I'm guessing I need to
tokenise the text
identify key tokens (function names, arguments)
try and map token combinations to function signatures
However I'm guessing this is already a 'solved problem' in the machine learning space. Should I be using Natural Language Processing ? Plain text search ? Something else ?
All ideas gratefully received, plus implementation suggestions [I'm using Python/AppEngine; I know about NLTK and Whoosh]
[PS I understand Google does this already, at least for the first two queries on the list. I'm guessing they also go it statistically, having a very large amount of search data. I don't have a large amount of data available, so will need an alternative approach].
After you tokenise the text, you need parsing to get a syntax tree of your natural language phrase. Once you have this, you can map the parse tree to a mathematical expression, and then evaluate the expression. I do not think this is a solved problem. I would start with several templates, say the first two, and experiment. The larger the domain of possible descriptions, the harder the task is.
I would recommend some tool for provide grammar/patterns on text like SimpleParse for python http://www.ibm.com/developerworks/linux/library/l-simple.html. As java programmer I would prefer GATE or graph-expression.

Resources