I'm trying to figure out what the most appropriate way to retrieve content (either a hash or string) from a set of ids.
The redis documentation talks about a tagging system here in which sets are used to filter down books, but does not mention how you would then get information about a book. You could obviously use mget() with a list of ids once you've filtered down the the ids, but this only really works if you're working with String values and not hashes. It also means that you need to return the ids back to your application code and convert "id" to "book:id". Is there a better way to do this?
Well there are many approaches you can take, Use MGET as suggested by you. What I do Generally use SORT function ... Assume we have 3 hashes like below and 1 set containing ids of those SETS
HMSET 1 fname a lname b
HMSET 2 fname c lname d
HMSET 3 fname e lname f
SADD fetch_from_set 1
SADD fetch_from_set 2
SADD fetch_from_set 3
SORT fetch_from_set BY NOSORT GET *->fname GET *->lname
1) "a"
2) "b"
3) "c"
4) "d"
5) "e"
6) "f"
So by using this you will get the values of fname and lname. As you are using NOSORT which which will not sort the SET it should not hamper performance much.
Also, with redis 2.8 you have scan command. I have not used it but you may want to look at it.
Related
I was using SQLite for an app in which I used to store 8-10 columns. I used to retrieve the data based upon a combination of any number of those attributes. Now I want to port to Redis. So I was building a test app for it.
But I am unable to think of how to design my redis system in such a way that I would be able to retrieve the data based upon any of those attributes. Any of you has any advice/experience?
I think the best advice is to avoid sticking to the relational model when porting something from a RDBMS to Redis. And beyond the model, an important difference is to focus on data access paths as well as the data structures.
Redis does not include a query language (but rather commands a la memcached), and cannot therefore reply to arbitrary queries. If an access path to the data is not part of the data structure, then the data cannot be efficiently retrieved.
Redis is not the best NoSQL store when it comes to supporting arbitrary queries. For instance, you would be better served by something like MongoDB.
Now, if you really want to implement your stuff with Redis, you could try to use a strategy similar to tagging engines. Your records can be stored in hash objects. For each column part of the arbitrary queries you need to support, you build reverse indexes using sets.
For instance:
# Set up the records: one hash object per record
hmset user:1 name Bilbo type Hobbit job None
hmset user:2 name Frodo type Hobbit job None
hmset user:3 name Gandalf type Maiar job Wizard
hmset user:4 name Aragorn type Human job King
hmset user:5 name Boromir type Human job Warrior
# Set up the indexes: one set per value per field
sadd name:Bilbo 1
sadd name:Frodo 2
sadd name:Gandalf 3
sadd name:Aragorn 4
sadd name:Boromir 5
sadd type:Hobbit 1 2
sadd type:Maiar 3
sadd type:Human 4 5
sadd job:None 1 2
sadd job:Wizard 3
sadd job:King 4
sadd job:Warrior 5
# Perform a query: we want the humans who happen to be a king
# We just have to calculate the intersection of the corresponding sets
sinterstore tmp type:Human job:King
sort tmp by nosort get user:*->name get user:*->job get user:*->type
1) "Aragorn"
2) "King"
3) "Human"
By combining union, intersection, difference, more complex queries can be implemented. For non discrete values, or for range based queries, ordered sets (zset) have to be used (and can be combined with normal sets).
This method is usually quite fast if the values are discriminant enough. Please note you do not have the flexibility of a RDBMS though (no regular expressions, no prefixed search, range queries are a pain to deal with, etc ...)
I would like to store some information as follows (note, I'm not wedded to this data structure at all, but this shows you the underlying information I want to store):
{ user_id: 12345, page_id: 2, country: 'DE' }
In these records, user_id is a unique field, but the page_id is not.
I would like to translate this into a Redis data structure, and I would like to be able to run efficient searches as follows:
For user_id 12345, find the related country.
For page_id 2, find all related user_ids and their countries.
Is it actually possible to do this in Redis? If so, what data structures should I use, and how should I avoid the possibility of duplicating records when I insert them?
It sounds like you need two key types: a HASH key to store your user's data, and a LIST for each page that contains a list of related users. Below is an example of how this could work.
Load Data:
> RPUSH page:2:users 12345
> HMSET user:12345 country DE key2 value2
Pull Data:
# All users for page 2
> LRANGE page:2:users 0 -1
# All users for page 2 and their countries
> SORT page:2:users By nosort GET # GET user:*->country GET user:*->key2
Remove User From Page:
> LREM page:2:users 0 12345
Repeat GETs in the SORT to retrieve additional values for the user.
I hope this helps, let me know if there's anything you'd like clarified or if you need further assistance. I also recommend reading the commands list and documentation available at the redis web site, especially concerning the SORT operation.
Since user_id is unique and so does country, keep them in a simple key-value pair. Quering for a user is O(1) in such a case... Then, keep some Redis sets, with key the page_id and members all the user_ids..
My structure
cat:id:name -> name of category
cat:id:subcats -> set of subcategories
cat:list -> list of category ids
The following gives me a list of cat ids:
lrange cat:list 0, -1
Do I have to iterate each id from the above command to get the name field in my script? Because that seems inefficient. How can I get a list of category names from redis?
There are a couple different approaches. You may want to have the values in the list be delimited/encoded strings that contain both the id, the name, and any other value you need quick access to. I recommend JSON for interoperability and efficient string length, but there are other formats which are more performant.
Another option is to, like you said, iterate. You can make this more efficient by getting all your keys in a single request and then using MGET, pipelining, or MULTI/EXEC to fetch all the names in a single, efficient, operation.
Imagine that I have a model that describes the printers that an office has. They could be ready to work or not (maybe in the storage area or it has been bought but not still in th office ...). The model must have a field that represents the phisicaly location of the printer ("Secretary's office", "Reception", ... ). There cannot be two repeated locations and if it is not working it should not have a location.
I want to have a list in which all printers appear and for each one to have the locations where it is (if it has). Something like this:
ID | Location
1 | "Secretary's office"
2 |
3 | "Reception"
4 |
With this I can know that there are two printers that are working (1 and 3), and others off line (2 and 4).
The first approach for the model, should be something like this:
class Printer(models.Model):
brand = models.CharField( ...
...
location = models.CharField( max_length=100, unique=True, blank=True )
But this doesn't work properly. You can only store one register with one blank location. It is stored as an empty string in the database and it doesn't allow you to insert more than one time (the database says that there is another empty string for that field). If you add to this the "null=True" parameter, it behaves in the same way. This is beacuse, instead of inserting NULL value in the corresponding column, the default value is an empty string.
Searching in the web I have found http://www.maniacmartin.com/2010/12/21/unique-nullable-charfields-django/, that trys to resolve the problem in differnt ways. He says that probably the cleanest is the last one, in which he subclass the CharField class and override some methods to store different values in the database. Here is the code:
from django.db import models
class NullableCharField(models.CharField):
description = "CharField that obeys null=True"
def to_python(self, value):
if isinstance(value, models.CharField):
return value
return value or ""
def get_db_prep_value(self, value):
return value or None
This works fine. You can store multiple registers with no location, because instead of inserting an empty string, it stores a NULL. The problem of this is that it shows the blank locations with Nones instead of empty string.
ID | Location
1 | "Secretary's office"
2 | None
3 | "Reception"
4 | None
I supposed that there is a method (or multiple) in which must be specify how the data must be converted, between the model and the database class manager in the two ways (database to model and model to database).
Is this the best way to have an unique, blank CharField?
Thanks,
You can use a model method to output the values in a custom way.
Like this (in your model class):
def location_output(self):
"Returns location and replaces None values with an empty string"
if self.location:
return self.location
else:
return ""
Then you can use it in views like this.
>>> Printer.objects.create(location="Location 1")
<Printer: Printer object>
>>> Printer.objects.create(location=None)
<Printer: Printer object>
>>> Printer.objects.get(id=1).location_output()
u'Location 1'
>>> Printer.objects.get(id=2).location_output()
''
And in your templates, like this.
{{ printer.location_output }}
Try to check this thread.
Unique fields that allow nulls in Django
I have two tables.
In one table there are two columns, one has the ID and the other the abstracts of a document about 300-500 words long. There are about 500 rows.
The other table has only one column and >18000 rows. Each cell of that column contains a distinct acronym such as NGF, EPO, TPO etc.
I am interested in a script that will scan each abstract of the table 1 and identify one or more of the acronyms present in it, which are also present in table 2.
Finally the program will create a separate table where the first column contains the content of the first column of the table 1 (i.e. ID) and the acronyms found in the document associated with that ID.
Can some one with expertise in Python, Perl or any other scripting language help?
It seems to me that you are trying to join the two tables where the acronym appears in the abstract. ie (pseudo SQL):
SELECT acronym.id, document.id
FROM acronym, document
WHERE acronym.value IN explode(documents.abstract)
Given the desired semantics you can use the most straight forward approach:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
joins = []
for id, abstract in documents:
for word in abstract.split():
try:
index = acronyms.index(word)
joins.append((id, index))
except ValueError:
pass # word not an acronym
This is a straightforward implementation; however, it has n cubed running time as acronyms.index performs a linear search (of our largest array, no less). We can improve the algorithm by first building a hash index of the acronyms:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
index = dict((acronym, idx) for idx, acronym in enumberate(acronyms))
joins = []
for id, abstract in documents:
for word in abstract.split():
try
joins.append((id, index[word]))
except KeyError:
pass # word not an acronym
Of course, you might want to consider using an actual database. That way you won't have to implement your joins by hand.
Thanks a lot for the quick response.
I assume the pseudo SQL solution is for MYSQL etc. However it did not work in Microsoft ACCESS.
the second and the third are for Python I assume. Can I feed acronym and document as input files?
babru
It didn't work in Access because tables are accessed differently (e.g. acronym.[id])