Python List of users - arrays

I want to learn how to create an API, so I want to create an empty dictionary where the first key is Names. Names will be a dictionary with the names of the users the system will have.
How do I actually do it with python?
People = [{}]
I want it to be something like:
People = [Names:["name1", "name2"... "nameN"]]
later on, I want to add more information like for example:
People[Names:[], Age:["1","2"]..]
I want at some stage be able to relate any name to any other key correctly.
name1 has age 1 and next key...
How do I declare this dictionary?

Perhaps using a pandas DataFrame is useful in this case. As the following example illustrates, it allows you to easily add both people and variables.
import pandas as pd
df = pd.DataFrame({'Names':['name1', 'name2'], 'Age':[1, 2]})
# adding a column: Gender
df['Gender'] = ['male', 'female']
# adding a row for name3, a three-year-old male
third_person = {'Names':'name3', 'Age':3, 'Gender':'male'}
df = df.append(third_person, ignore_index=True)

What I was trying to do:
person = {
"name":
{
"title":"Herr","first":"foo","last":"bar"
},
"email":"bar#foo.de",
"password":"123",
"property":"value"
}

Related

How to get all vector ids from Milvus2.0?

I used to use Milvus1.0. And I can get all IDs from Milvus1.0 by using get_collection_stats and list_id_in_segment APIs.
These days I am trying Milvus2.0. And I also want to get all IDs from Milvus2.0. But I don't find any ways to do it.
milvus v2.0.x supports queries using boolean expressions.
This can be used to return ids by checking if the field is greater than zero.
Let's assume you are using this schema for your collection.
referencing: https://github.com/milvus-io/pymilvus/blob/master/examples/hello_milvus.py
as of 3/8/2022
fields = [
FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)
]
schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")
Remember to insert something into your collection first... see the pymilvus example.
Here you want to query out all ids (pk)
You cannot currently list ids specific to a segment, but this would return all ids in a collection.
res = hello_milvus.query(
expr = "pk >= 0",
output_fields = ["pk", "embeddings"]
)
for x in res:
print(x["pk"], x["embeddings"])
I think this is the only way to do it now, since they removed list_id_in_segment

generate dataframe of model predictions by looping through dictionary of models

I would like to loop through a dictionary like this:
models = {'OLS': LinearRegression(),
'Lasso': Lasso(),
'LassoCV': LassoCV(n_alphas=300, cv=3)}
and then i want to generate a dataframe of the each model's predictions.
So far I wrote this to code, which only generates arrays of each result:
predictions = []
for i in models:
predictions.append(models[i].fit(X_train,y_train).predict(X_test))
As the final result, I want a dataframe with each column labelled by the key in the dictionary and the result values associated with the model key name inside the column.
Thank you!
Instead of appending the predictions to the list, you can directly insert the predictions into a data frame.
Code:
import pandas as pd
models = {'OLS': LinearRegression(),
'Lasso': Lasso(),
'LassoCV': LassoCV(n_alphas=300, cv=3)}
df = pd.DataFrame()
for i in models:
df[i] = models[i].fit(X_train,y_train).predict(X_test)

pandas to_sql in django: insert foreign key into DB

Is there a way to insert foreign keys when using pandas to_sql function?
I am processing uploaded Consultations (n=40k) with pandas in django, before adding them to the database (postgres). I got this working row by row, but that takes 15 to 20 minutes. This is longer than I want my users to wait, so I am looking for a more efficient solution.
I tried pandas to_sql, but I cannot figure out how to add the two foreign key relations as columns to my consultations dataframe before calling the to_sql function. Is there a way to add the Patient and Praktijk foreign keys as a column in the consultations dataframe?
More specifically, when inserting row by row, I use objects of type Patient or Praktijk when creating new consultations in the database. In a dataframe however, I cannot use these types, and therefore don't know how I could add the foreign keys correctly. Is there possibly a value of type object or int (a patient's id?) which can substitute a value of type Patient, and thereby set the foreign key?
The Consultation model:
class Consultation(models.Model):
# the foreign keys
patient = models.ForeignKey(Patient, on_delete=models.CASCADE, null=True, blank=True)
praktijk = models.ForeignKey(Praktijk, on_delete=models.CASCADE, default='')
# other fields which do not give trouble with to_sql
patient_nr = models.IntegerField(blank=True, null=True)
# etc
The to_sql call:
consultations.to_sql(Consult._meta.db_table, engine, if_exists='append', index=False, chunksize=10000)
If above is not possible, any hints towards another more efficient solution?
I had same problem and this is how I solved it. My answer isn't as straight forward but I trust it helps.
Inspect your django project to be sure of two things:
Target table name
Table column names
In My case, I use class Meta when defining django models to use explicit name (django has a way of automatically naming tables). I will use django tutorial project to illustrate.
class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published')
class Meta:
db_table = "poll_questions"
class Choice(models.Model):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
class Meta:
db_table = "question_choices"
Note: Django references Question foreign key in the database using pk of the Question object.
Assume I have a Question pk 1, and a dataframe df that I wish to update Question choices with. My df must look like one below if using pandas to batch insert into database!
import pandas as pd
df = pd.DataFrame(
{
"question": [1, 1, 1, 1, 1],
"choice_text": [
"First Question",
"Second Question",
"Third Question",
"Fourth Question",
"Fifth Question"
],
"votes":[5,3,10,1,13]
}
)
I wish I could write the df as a table. Too bad that SO doesn't support usual markdown for tables
Nonetheless, we have our df next step is to create database connection for inserting the records.
from django.conf import settings
from sqlalchemy import create_engine
# load database settings from django
user = settings.DATABASES['default']['USER']
passwd = settings.DATABASES['default']['PASSWORD']
dbname = settings.DATABASES['default']['NAME']
# create database connection string
conn = 'postgresql://{user}:{passwd}#localhost:5432/{dbname}'.format(
user=user,
passwd=passwd,
dbname=dbname
)
# actual database connection object.
conn = create_engine(conn, echo=False)
# write df into db
df.to_sql("question_choices", con=conn, if_exists="append", index=False, chunksize=500, method="multi")
Voila!
We are done!
Note:
django supports bulk-create which, however, isn't what you asked for.
I ran into a similar problem using SQLalchemy but I found a simple workaround.
What I did is defined the database schema the way I wanted with SQLalchemy (with all the datatypes and foreign keys I needed) and then created an empty table, then I simply changed the if_exists parameter to append.
This will append all the data to an empty database.

Match and do nothing if already exist

I need some help again please.
I am trying to create a list of items which I am calling from an Excel spreadsheet.
Let's say that columns A holds a list of countries.
America
South Africa
Belgium
America
Now there are other items attached to the countries in the corresponding row, but at column D, so there might be more items in other columns that correspond to the country in the first cell, like this.
______________A__________________________B___________________C___________
---------------|----------------|-------------|
America........|..Samsung.......|...1234......|
South Africa...|..Dell..........|...54321.....|
Belgium........|..iPhone........|...2345......|
America........|..Nokia.........|...9876......|
I want to publish this to an XML sheet, but I do not want to create each country more than once, so I want to check the row for entry and if it does not exist, create it. So in the table above, I have America twice, but it needs to create America once only as an XML entry and from there I will attach the other items.
For now I am getting the row data by counting the rows in the sheet as it will differ each time, then I need to start writing XML.
use Spreadsheet::Read;
#use XML::Writer
my $book = ReadData("InfoDB.xlsx");
my #rows = Spreadsheet::Read::rows($book->[1]);
my $count = 1;
my #clause_all;
foreach $tab (#rows) {
$count++;
#row = Spreadsheet::Read::cellrow($book->[1], $count);
#country = $row[1];
}
If anyone can please help me with matching this into an array or somehow it would be great!
I tried a whole lot of methods but cannot get a perfect result, I would actually bore you if I posted each try I attempted. :(
Create a hash, use the country name as keys:
Then push your new data onto an array reference stored at that key - this is psuedocode. You'll need to sprinkle in the spreadsheet madness to make it work.
%countries = ();
foreach my $row ( #rows) {
my ($country, $thing, $number) = row2columns($row);
push #{ $countries{$country} }, [ $thing, $number ];
}
now you have a big hash that you can convert to XML in your preferred manner.
Something along the lines of:
my #country;
foreach my $tab (#rows) {
...
# The smart match operator (~~) will return true if the value on the
# left is found in the list on the right.
unless ($row[1] ~~ #country) {
# do the things you need to do then add the country to the list
push #country, $row[1];
}
}

Best database design (model) for user tables

I'm developping a web application using google appengine and django, but I think my problem is more general.
The users have the possibility to create tables, look: tables are not represented as TABLES in the database. I give you an example:
First form:
Name of the the table: __________
First column name: __________
Second column name: _________
...
The number of columns is not fixed, but there is a maximum (100 for example). The type in every columns is the same.
Second form (after choosing a particular table the user can fill the table):
column_name1: _____________
column_name2: _____________
....
I'm using this solution, but it's wrong:
class Table(db.Model):
name = db.StringProperty(required = True)
class Column(db.Model):
name = db.StringProperty(required = True)
number = db.IntegerProperty()
table = db.ReferenceProperty(table, collection_name="columns")
class Value(db.Model):
time = db.TimeProperty()
column = db.ReferenceProperty(Column, collection_name="values")
when I want to list a table I take its columns and from every columns I take their values:
data = []
for column in data.columns:
column_data = []
for value in column.values:
column_data.append(value.time)
data.append(column_data)
data = zip(*data)
I think that the problem is the order of the values, because it is not true that the order for one column is the same for the others. I'm waiting for this bug (but until now I never seen it):
Table as I want: as I will got:
a z c a e c
d e f d h f
g h i g z i
Better solutions? Maybe using ListProperty?
Here's a data model that might do the trick for you:
class Table(db.Model):
name = db.StringProperty(required=True)
owner = db.UserProperty()
column_names = db.StringListProperty()
class Row(db.Model):
values = db.ListProperty(yourtype)
table = db.ReferenceProperty(Table, collection_name='rows')
My reasoning:
You don't really need a separate entity to store column names. Since all columns are of the same data type, you only need to store the name, and the fact that they are stored in a list gives you an implicit order number.
By storing the values in a list in the Row entity, you can use an index into the column_names property to find the matching value in the values property.
By storing all of the values for a row together in a single entity, there is no possibility of values appearing out of their correct order.
Caveat emptor:
This model will not work well if the table can have columns added to it after it has been populated with data. To make that possible, every time that a column is added, every existing row belonging to that table would have to have a value appended to its values list. If it were possible to efficiently store dictionaries in the datastore, this would not be a problem, but list can really only be appended to.
Alternatively, you could use Expando...
Another possibility is that you could define the Row model as an Expando, which allows you to dynamically create properties on an entity. You could set column values only for the columns that have values in them, and that you could also add columns to the table after it has data in it and not break anything:
class Row(db.Expando):
table = db.ReferenceProperty(Table, collection_name='rows')
#staticmethod
def __name_for_column_index(index):
return "column_%d" % index
def __getitem__(self, key):
# Allows one to get at the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# col1 = first_row[1]
# col12 = first_row[12]
value = None
try:
value = self.__dict__[Row.__name_for_column_index]
catch KeyError:
# The given column is not defined for this Row
pass
return value
def __setitem__(self, key, value):
# Allows one to set the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# first_row[5] = "New values for column 5"
self.__dict__[Row.__name_for_column_index] = value
# In order to allow efficient multiple column changes,
# the put() can go somewhere else.
self.put()
Why don't you add an IntegerProperty to Value for rowNumber and increment it every time you add a new row of values and then you can reconstruct the table by sorting by rowNumber.
You're going to make life very hard for yourself unless your user's 'tables' are actually stored as real tables in a relational database. Find some way of actually creating tables and use the power of an RDBMS, or you're reinventing a very complex and sophisticated wheel.
This is the conceptual idea I would use:
I would create two classes for the data-store:
table this would serve as a
dictionary, storing the structure of
the pseudo-tables your app would
create. it would have two fields :
table_name, column_name,
column_order . where column_order
would give the position of the
column within the table
data
this would store the actual data in
the pseudo-tables. it would have
four fields : row_id, table_name,
column_name , column_data. row_id
would be the same for data
pertaining to the same row and would
be unique for data across the
various pseudo-tables.
Put the data in a LongBlob.
The power of a database is to be able to search and organise data so that you are able to get only the part you want for performances and simplicity issues : you don't want the whole database, you just want a part of it and want it fast. But from what I understand, when you retrieve a user's data, you retrieve it all and display it. So you don't need to sotre the data in a normal "database" way.
What I would suggest is to simply format and store the whole data from a single user in a single column with a suitable type (LongBlob for example). The format would be an object with a list of columns and rows of type. And you define the object in whatever language you use to communicate with the database.
The columns in your (real) database would be : User int, TableNo int, Table Longblob.
If user8 has 3 tables, you will have the following rows :
8, 1, objectcontaintingtable1;
8, 2, objectcontaintingtable2;
8, 3, objectcontaintingtable3;

Resources