Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Currently there exist package like gonzui (example of the implementation here)
for doing source code search.
Is there a similar package that does the same thing except for simple file search.
Basically I have two list of files for file type A and file type B. When the user type a word
in the search box, all files (in "gz" format) with names match to the search term from type A and B will be displayed.
Is there any ready package that does that?
I am aware of CGI implementation via Perl. But it is difficult for me to have a
simple and elegant interface/display in it with CGI.
We use Omnifind which works pretty well. You might also look into Nutch or Lucene.
Do you need it open-source and/or free?
Do you need full unicode support?
Also do you want a search or an index? A search does not use any pre-computed information, for every search you have to porcess all the file data.
For an index you would have to pre-process / index the file data.
DTsearch is a commercial / not free index engine.
The fact that you mention a "database" would indicate to me that you are looking into an index.
There are hooks into the microsoft indexing service and you can also use MsSQL to index text data.
I am not quite sure I understand what you're looking for, or what your use case is exactly.
However, off the top of my head, there's the grep family of tools (grep, fgrep, egrep).
There's also find, which I think is more along the lines of what you're looking for.
And if performance matters, there's locate, which is based on an index that you will have to update periodically.
All of these come pre-installed with most flavors of UNIX.
I hope this helps.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Just for self education I decided to implement "hello world" distributed file system. The simplest one. And decide to read about theory under this subject.
But... when I asking google about this it shows answers like "how to configure hdfs" or "how to set distributed fs on windows" what is not what I interested in...
Could someone please point me on some good articles or books on this subject.
Thanks a lot!
Well, if you really decided to implement such a file system, you must start with distributed systems. I recommend reading the Tanenbaum reference book http://www.distributed-systems.net/index.php?id=distributed-systems-principles-and-paradigms
Careful, the subject is really complex and distributed systems are all but simple to implement.
If you want to have a look to some already implemented distributed file systems, you may have a look to GFS/GFS2 (from RedHat). You may also have a look to ocfs2 from oracle.
You may also have a look to gluster https://fr.wikipedia.org/wiki/GlusterFS
You may also be able to find some white papers on the google file system (when it was still a university work).
The main problem of such distributed system is the failure detection (detect when a node crashes while writing on the file system => need to make sure there are no corruptions). There are multiple strategy, one may be to implement a journal which is protected by a distributed lock.
Another great (classical) problem is the 'split brain' problem, when the cluster is split in two groups because of a network failure (imagine a switch that is broken). Both groups 'think' that the other one is dead (they cannot communicate with it) but there is no way to make sure that the distant group is not writing data causing the data to diverge.
Hope you find what you want with all this.
Edit:
Now GFS is deprecated, redhat is using and developing 'Ceph'
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I started a while ago to learn the C language, and has spent several hours I search THE miracle software.
I am looking for software that import sources of software in C (files.c) and generates a "mind map" of the code with all files, functions, variables, etc ...
Do you know if it exists? It'll help me a lot to understand the architecture of complex software.
Thank you very much for all your answers.
Take a look at the "call graph". This sort of visualization should get you started.
As the comment suggests, Doxygen is a good open-source tool. Take a look at some output here. Doxygen is straight-forward to configure for call-graph generation under *nix. It's a little more complex for Windows. First, check out this SO post: how to get doxygen to produce call & caller graphs for c functions. Doxygen's HTML output provides a number of nice cross-referencing features (files, variables, structs, etc.) in addition to caller/callee graphs.
On the commercial side, Understand for C/C++ has first-rate visualization features. Google "c call graph diagram" for other commercial and open-source options.
Finally, there are some older SO posts, like this one Tools to get a pictorial function call graph of code. Take a look at it.
Look into the program ctags. It is an indexer of names and functions based on the structure of the programming language.
It is quite mature, and has integration with a number of other tools. I use it with an older (but very nice) text editor called vi, but it can be used independently from the command line.
It does not generate a graphical view of the connections. However, in my estimation there are probably too many connections in most C programs to display visually without creating a large amount of information overload.
This answer differs from Throwback's answer in some interesting ways. A call graph can mean a few things. One thing it can mean is the path a running program took through a section of code, and another is the combination of all paths a running program might take through the code, and another is the combination of all paths in the code (whether they can be reached or not).
Your needs will drive which tool you should use.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have the following scenario. I need a db to store XML messages that have been created by a reader. I then want to use a transport (wcf) to read the db external to the populating app and send the messages to a central db Generally the db needs to run on mono, and windows.
I did look at sqlite3, and it seemed to fit all my requirements, but i'm reading its not so good on multi process access and t's moving away from my sweet spot, these last couple of days.
Thanks.
Have you considered just using XML to store the data? It doesn't get any more portable than that and will work fine as long your client-side storage needs are simple. E.g. not a large amount of many domain objects that need to be stored.
Additionally using an XML data store solves a lot of setup and installation headaches. You simply reference a file (or files) relative to your executable. You don't need to worry about installing db engines for a variety of platforms and then worry about upgrading.
WOuld it be feasible to give each process their own sqlite3 database? They all ultimately use the central database anyway, right?
Have a look at Firebird.
You can use it as an embedded engine just like SQLite, but it can scale to a full blown server as well.
The only drawback is, that the documentation is a mess
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm writing an application that manipulates some sort of social network data, so the ideal underlying data structure is weighted directed graph. I'd like to do the manipulation (and searching) directly on the data, without first loading the entire graph into memory and serializing after.
This could be simulated using a standard SQL database, or key/value store, but that would be very inefficient (for the graph-traversal algorithms I'd like to use, e.g shortest path, etc.).
I'm half a mind to write my own since googling didn't turn up any useful results, but I'd much rather use an existing solution (if there is any and I missed it) than reinvent the wheel. The project is for fun / personal research, so the software would have to be open source (and prefferably capable of running under Linux).
So, are there any projects that would fit the above description?
Thanks!
If you're using Java you can try http://neo4j.org/
What about an ODBMS? db40 has Java and .NET implementations, so both run on Linux.
You can also look at a graph as an array of nodes. Where each nodes stores a list of its siblings.
So you could simply store 1 file per node in your graph. Then the contents of that file is the list of nodes it is connected to (pointed to since directed).
Then you can read in a node as you need it.
This allows you to do things like iterate through the whole tree, while only keeping a single node in memory at a time.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
for a program I am writing I would need a dictionary between Spanish and English words. I googled a while, but I could not find any database freely available. Does anybody know where or how to get such a database (preferably a simple CSV or XML file)?
So far my best idea to create such a dictionary is to create a little program that looks up an English word on Wikipedia, and uses the language links to extract the correct translation. But I don't want to want to make a million requests to Wikipedia just to generate this database...
I don't need anything fancy, just a mapping from one word to one or possibly multiple translations for this word. Just like a regular dictionary.
Ask around on the Omega Wiki, formerly known as the Ultimate Wiktionary or Wiktionary Z. They collect translations from all languages into all languages, and their data is available in a relational database.
Do you need to translate on the fly at runtime, or is this a one-time translation of labels and messages for a UI?
I'd say that runtime translation will be remarkably difficult, because you'll need more than a dictionary of words. Natural language processing is difficult in any language. Most languages need to know something about context to translate smoothly.
If it's a one-time translation of UI elements, I've had good luck using Google Translate to go from Japanese to English.
To answer your question, I don't have a database like that, sorry.
The problem with natural languages is that they are very context dependent, so the same word in English can mean many things in French. Take the English verb 'to know'. This can be translated into French as either 'savoir' (to know a fact), or connaitre (to know a person, or a town).
I'd be very interested to know if there exists such a database, but I doubt if it exists.
Sites like http://www.reverso.net hedge their bets by showing both results.