bedtools shuffle -excl doesn't seem to warn about consistent naming convention - shuffle

After using bedtools shuffle a lot with the -excl set on I noticed that the excluding file has the numeric naming convention whereas my input file has the "chr" naming convention. Does someone know if this will cause any problems? I have generated many shuffled files without any errors or warnings and everything seems fine, so I was wondering if someone knows this, before I eventually need to redo everything?

Ok, so I quickly made a run with a small test data that confirmed that -excl only works when we have naming convention, but that it doesn't warn the user about this when we have inconsistent convention. This is pretty worrisome IMO.
(bedtools v2.30.0)

Related

Partially pre-compile code (or maybe use .so library) while leaving another part of code open to edits

I'm trying to do a somewhat odd thing that realistically I'm not sure is even possible with current constraints but is outside of my scope of knowledge so it could be. I'll hopefully be able to make everything clear enough in the question, but it will be a little broad in scope, its too big to get detailed.
Anyway, I have a C codebase (we'll call it bar) that is rather large and takes a bit of time to compile. Not a huge deal normally, but now there is a set of files that are changed often and currently the changes can only be confirmed as good after running a compile. Due to the nature of how these are changed it could result in people running multiple compiles in a day, taking quite a lot of time.
What I want to do on a broad scale is only have to actually compile the set of files that might change (about 20, all in 1 directory, we'll call it foo) and have everything else (bar and everything under it except for foo) ready before hand. Initially was looking at .so library for the task, but not positive anymore that's correct. Either way, it still seemed likely to be reasonably possible until I realized that some of the files in directory foo were included by other files in bar. Mostly the files in foo only include files and are kind of the end point, not being included in things. But with a few of them being included I'm not sure what can be done.
My two thoughts are generate a .so library of everything outside of foo that somehow still checks on the needed included files at compile time, or get some kind of general pre-compile set up. Neither of these seem like they would work at a glance, but I very well could be wrong.
A third option, less ideal but better then nothing, is to generate the .so library with everything including any files in foo that are needed at that point, just leaving out the files that aren't included anywhere. It seems like this would work better, though even if it would I'm still not really sure how to go about it.
So basically, is there a way to do what I want to some extent, and if so what is the best method?
Sorry about the broadness of the question, the codebase is too large to provide lots of detail. I will try to edit and add in any information that people think is needed though. Thanks for the help.

Selective compilation of source code

I am working on a C project which is quite large and consists of multiple source files. I have written a script to find out all the functions in this code that are never used (Only defined once but never used elsewhere). Now I want to compile my code without including these functions. Is there any direct way to exclude certain functions from a compilation?
I understand that I can use #ifdef/#endif for each of these functions and leave them out, but inserting these at the right location using a script is turning out to be really challenging, hence the question.
PS: I have already used all compiler/linker optimizations and this exercise is supposed to be beyond those (as no optimization has been successful in removing 100% dead code and I dont expect it to). So I am not really looking for answers in that area.

splint whole program with a complex build process

I want to run splints whole program analysis on my system. However the system is quite large and different parts are compiled with different compiler defines and include paths. I can see how to convey this information to splint for a single file but I can't figure out how to do it for whole program. Does anyone know a way of doing this?
Assuming you have a Makefile you could create a new target; then you would go through the actual compilation steps to duplicate them using Splint instead of the compiler.
My advice, however, is against the full-program approach. If you can isolate your system into separate parts, I'd rather start by checking them, one by one. Since your program is "quite large", expect a gazillion warnings... for each one of your modules. You will start to get rid of them once you have sprinkled your source code with the appropriate semantic annotations. Good luck! :)

To remove #ifdef DEBUG parts for release or not?

When releasing source code for someone else to see, when coding style is not well defined
(no pun intended)
do you remove the #ifdef DEBUG parts?
(that is the parts that are compiled only when DEBUG is defined)
If I remove it, it makes the code looks better (or me look better - do I really want someone to know I've debugged, and how I've done it? ), but then I'll lose my debug parts, or have to keep two (or more) versions of the code.
What is to be done?
I think if your debug code is clean and has "professional" language in any logging statements, it's okay to leave it in. If the debug code is sloppy or has debug messages like "I'm here...," "Now I'm here..." you should take it out.
If your debug statements reflect the fact that there are issues that you can't figure out, it might be best to take them out, if you're trying to "sell" your software to someone. (Hopefully you can fix them later...)
You should leave the code as is, unless you make use of non-recomadable language in your commentary. If someone is to use your code, chances are they'll need those, or it will help them understand your code. (this is also true for commentaries)
Edit: I worked on drop of other studio code often in the past. I have seen debug code, dead path and many other stuff, still the only thing I hated, was people that strip their code of debug and commentary, this makes their code real hard to maintain
I also vote to leave it in. If/when you start work on your first patch, you'll likely need those DEBUG-blocked pieces. Also, QA won't love it that you removed the code, even if it is blocked in a directive.
If you do decide to remove them, just filter them out with a script when exporting the code, no need to maintain two versions.
Maintain your base version with everything in your source code management system.
Then if you want to distribute source code filtered in one or more ways, make a script that will make a release version of your source code.
Do not maintain these secondary filtered repositories, make them always generated.
But is it worth the time? Probably not, and you should probably just distribute everything including the #ifdef DEBUG parts.
Maintaining multiple versions of ANYTHING is undesireable.
Only do so if you must.

Typical C with C Preprocessor refactoring

I'm working on a refactoring tool for C with preprocessor support...
I don't know the kind of refactoring involved in large C projects and I would like to know what people actually do when refactoring C code (and preprocessor directives)
I'd like to know also if some features that would be really interesting are not present in any tool and so the refactoring has to be done completely manually... I've seen for instance that Xref could not refactor macros that are used as iterators (don't know exactly what that means though)...
thanks
Anybody interested in this (specific to C), might want to take a look at the coccinelle tool:
Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. Beyond collateral evolutions, Coccinelle is successfully used (by us and others) for finding and fixing bugs in systems code.
Huge topic!
The stuff I need to clean up is contorted nests of #ifdefs. A refactoring tool would understand when conditional stuff appears in argument lists (function declaration or definitions), and improve that.
If it was really good, it would recognize that
#if defined(SysA) || defined(SysB) || ... || defined(SysJ)
was really equivalent to:
#if !defined(SysK) && !defined(SysL)
If you managed that, I'd be amazed.
It would allow me to specify 'this macro is now defined - which code is visible' (meaning, visible to the compiler); it would also allow me to choose to see the code that is invisible.
It would handle a system spread across over 100 top-level directories, with varying levels of sub-directories under those. It would handle tens of thousands of files, with lengths of 20K lines in places.
It would identify where macro definitions come from makefiles instead of header files (aargh!).
Well, since it is part of the preprocessor... #include refactoring is a huge huge topic and I'm not aware of any tools that do it really well.
Trivial problems a tool could tackle:
Enforcing consistent case and backslash usage in #includes
Enforce a consistent header guarding convention, automatically add redundant external guards, etc.
Harder problems a tool could tackle:
Finding and removing spurious includes.
Suggest the use of predeclarations wherever practical.
For macros... perhaps some sort of scoping would be interesting, where if you #define a macro inside a block, the tool would automatically #undef it at the end of a block. Other quick things I can think of:
A quick analysis on macro safety could be helpful as a lot of people still don't know to use do { } while (0) and other techniques.
Alternately, find and flag spots where expressions with side-effects are passed as macro arguments. This could possibly be really helpful for things like... asserts with unintentional side-effects.
Macros can often get quite complex, so I wouldn't try supporting much more than simple renaming.
I will tell you honestly that there are no good tools for refactoring C++ like there are for Java. Most of it will be painful search and replace, but this depends on the actual task. Look at Netbeans and Eclipse C++ plugins.
I've seen for instance that Xref could
not refactor macros that are used as
iterators (don't know exactly what
that means though)
To be honest, you might be in over your head - consider if you are the right person for this task.
If you can handle reliable renaming of various types, variables and macros over a big project with an arbitrarily complex directory hierarchy, I want to use your product.
Just discovered this old question, but I wanted to mention that I've rescued the free version of Xrefactory for C, now named c-xrefactory, which manages to do some refactorings in macros such as rename macro, rename macro parameter. It is an Emacs plugin.

Resources