Apply Planning to Web search - artificial-intelligence

This is a homework, the question is : How would you apply the ideas of planning to Web search? Please answer at a high level; a paragraph or two will be sufficient.
I'm not sure what is the high level of planning, and don't know where to start. I read some papers about the information retrieval and web search but still cant connect them with Planning. Hope someone can give me some hints, like the relationship of planning and information retrival or web search.

I am not sure which part of web-search you are referring to. I'm assuming your question is about connection between AI planning and Web-search in general.
A web-crawler is integral part of a search engine. I think there are a plethora of AI planning algorithms used in order to decide which type of pages to crawl next and at what frequency to crawl a domain.
The other area where AI planning helps in Ad-Bidding. I'm not so sure about the applications specifically what kind of algorithms are used. May be do some google search to figure out.

Related

Articles About Data Acquisition from Existing Sources

I'm writing a report about best practices for data acquisition from existing sources. However, I'm having trouble finding many articles or papers about what I need. Most of what I find is either data acquisition from the source, e.g. a weather station, or gloss over really quickly the part where you talk to a different team about getting access to their data.
Esentially, I'm looking for articles that list some good questions to ask when you talk to other teams in your orginization about getting access to their data so the BI's can use it. Even a search term I could use that would better describe what I'm looking for. At first I thought maybe data wrangling could be it, but all my searches yield articles about manipulating the data to what you need.
To recap, I'm after some pointers to articles or search terms I could use to find the articles that cover a set of best practices that you should follow when talking to other departments about getting access to their databases for BI purposes.
First of all, I recommend paying attention to your search engine, there are many more of them than just Google. In my practice, there are often cases when it was possible to find rare information that could not be found by one search engine, but managed by others.
Here are the results of the query "how the analyst communicates with other departments" from Google search:
A set of tips for a communication analyst -
https://inside.getyourguide.com/blog/2020/7/28/10-tips-on-effective-communication-in-business-intelligence
A set of tips for an analyst on interdepartmental communication -
https://www.phocassoftware.com/business-intelligence-blog/four-ways-to-promote-interdepartmental-communication
An article about communication-oriented modeling to improve the efficiency of an analyst -
https://www.crystalloids.com/news/fully-communication-oriented-information-modelling
And here are the results of the same query but in DuckDuckGo:
A set of general interdepartmental communication tips -
https://www.themuse.com/advice/how-to-communicate-better-with-other-departments
An article on effective communication for an analyst - https://www.northeastern.edu/graduate/blog/communicating-with-data/
A set of tips for an analyst to improve communication skills -
https://www.modernanalyst.com/Resources/Articles/tabid/115/ID/1994/10-Ways-to-Hone-Your-Communication-Skills-as-a-Business-Analyst.aspx
And these are only 2 search engines, and there are also Bing, Ecosia, Yandex, Startpage. With the right query and a variety of search tools, you can get any information of interest.

Database suggestion (and possible readings) for heavy computational website

I'm building a website that will rely on heavy computations to make guess and suggestion on objects of objects (considering the user preferences and those of users with similar profiles). Right now I'm using MongoDB for my projects, but I suppose that I'll have to go back to SQL for this one.
Unfortunately my knowledge on the subject is high school level. I know that there are a lot of relational databases, and was wondering about what could have been some of the most appropriate for this kind of heavily dynamic cluster analysis. Also I would really appreciate some suggestion regarding possible readings (would be really nice if free and online, but I won't mind reading a book. Just maybe not a 1k pages one if possible).
Thanks for your help, extremely appreciated.
Recommondations are typically a graph like problem, so you should also consider looking into graph databases, e.g. Neo4j

Semantic Search Engine

I want to design a Semantic Search engine for my final year Master's degree. I have been doing a fair amount of reading both casually on the web and academic papers so I am not a total noob in this field.
My aim is to build a semantic search engine, which parses out the HTML content into its equivatlent RDF triples,stores the triples in a triplestore, through which the engine will try to respond to the query fired using SPARQL. I want to do something out of the box unlike the other students . So, I decided to build a semantic search engine.
Right now, I had a running search engine using Solr which performs keyword search, what I want to do is the semantic search. I know some open source tools regarding Web 3.0 but not sure whether they will be compatible with Solr or not.
So, can you please provide me some help for building the same.
Thanks.
Regards
Although it sounds hard, but you will not be able to capture everything.
You need a lot of data. Of course, there already is a lot of data arranged in formats like owl and rdf which you may use (e.g. WordNet, Yago, GeoNames etc), but although they are of huge size, they only focus on very small portions of a possible discourse universe.
Developing a good semantic search takes a lot of resources and brain power. Projects, like for example KompParse at the German Research Center for Artificial Intelligence, which only focus on a small part of human conversation (gossip or buying furniture) have been running for several years with several employees by now and are still just "ok".
Understanding semantics has already been implemented in different search engines, take google for example, or wolfram alpha. So this topic might not even be as much "out of the box" as you think.
So I will go with user723630 and strongly advise you, to focus on a smaller topic. You will still achieve a lot, but you will not get frustrated.

App Engine Full Text Search vs Geohashing for location queries

I'm thinking of porting an application from RoR to Python App Engine that is heavily geo search centric. I've been using one of the open source GeoModel (i.e. geohashing) libraries to allow the application to handle queries that answer questions like "what restaurants are near this point (lat/lng pair)" and things of that nature.
GeoModel uses a ListProperty which creates a heavy index which had me concerned about pricing as i have about 10 million entities that would need to be loaded into production.
This article that I found this morning seems pretty scary in terms of costs:
https://groups.google.com/forum/?fromgroups#!topic/google-appengine/-FqljlTruK4
So my question is - is geohashing a moot concept now that Google has released their full text search which has support for geo searching? It's not clear what's going on behind the scenes with this new API though and I'm concerned the index sizes might be just as big as if I used the GeoModel approach.
The other problem with the search API is that it appears I'd have to create not only my models in the datastore but then replicate some of that data (GeoPtProperty and entity_key for the model it represents at a minimum) into Documents which greatly increases my data set.
Any thoughts on this? At the moment I'm contemplating scraping this port as being too expensive although I've really enjoyed working in the App Engine environment so far and would love to get away from EC2 for some of my applications.
You're asking many questions here:
is geohashing a moot concept: Probably not, I suspect the Search API uses geohashing, or something similar for its location search.
can you use the Search API vs implementing it yourself: yes, but I don't know the cost one way or the other.
is geohashing expensive on app engine: in the message thread the cost is bad due to high index write costs. you'll have to engineer your geohashing data to minimize the indexing. If GeoModel puts a lot of indexed values in the list, you may be in trouble - I wouldn't use it directly without knowing how the indexing works. My guess is that if you reduce the location accuracy you can reduce the number of indexed entries, and that could save you a lot of cost.
As mentioned in the thread, you could have the geohashing run in CloudSQL.

FlockDB - What is it? And best cases for it uses

Just came across FlockDB graph database. Details at github /flockDB. Twitter claims it uses FlockDB for the following:
Twitter runs FlockDB on a large cluster of machines.
we use it to store social graphs (who
follows whom, who blocks whom) and
secondary indices at twitter.
At first glance, setup and trying it doesn't look straight forward. Have anyone already used it / setup this? If so, please answer the following general queries.
What kind of applications is it
better suited for? (Twitter claims it
is simple and very rough, it remains
to see what it meant though)
How is FlockDB better than other graph db /
noSQL db. Have you setup FlockDB,
used it for a application?
Early advices any?
Note: I am evaluating the FlockDB and other graph databases mainly for learning them. Perhaps, I will build an application for that.
Flockdb is still Yet to be released by Twitter, which means the current version you are seeing won't run properly. Going by the history of commits i guess within a couple of days you can see a stable version which you can build and test.
Compared to something like Neo4J you can say Flockdb is not even a graph database. The toughest part of a graph database is how many levels of depth it can handle. From the little documentation of Flockdb it seems like it can't handle more than 1 level of depth. Where FlockDb wins compared to DBs like Neo4J is it's low latency, high throughput and inherent distributed nature.
Regarding Applications - i guess it will be a great fit whenever you need social networking or twitter like behavior. I don't think many will find such use cases though (who gets 20k friend requests per sec ?).
I Just started looking into Flockdb. Right now i am planning to use it in my forum software. Instead of user1 follows user2 relationship, i am planning to use it for user1 read post1, user1 favorites post1 etc. Being one of the highly active online communities we get a lot of such traffic(read/favorite). Can't think of any other use cases now.
Don't miss OrientDB. It's a document-graph dbms with special operator for traversing relationships: http://code.google.com/p/orient/wiki/GraphDatabase

Resources