Add label to node from a CSV file in NEO4J - database

I am trying to add some nodes to my graph database from a CSV, which suppose is like:
| city continent feature_1 ...
|--------------------------------------------------
0 | Barcelona Europe
1 | Stockholm Europe
2 | New York America
3 | Nairobi Africa
4 | Tokyo Asia
The first approach was to simply load this data as:
// Insert city nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///city_data.csv' AS row
MERGE (city: City {name: row.city})
Next step was to incorporate the continent information, so I could have nodes of different colors. This means having two labels for each node, which is something I am not sure how to do. Anyway, for the moment I decided to simply have one label instead, which contained the continent information. Since this information is within the CSV file I believe apoc.create.node tool is the way to go. Hence, inspired by How to use apoc.load.csv in conjunction with apoc.create.node I tried the following:
CALL apoc.load.csv('file:///city_data.csv') YIELD row
CALL apoc.create.node(['row.continent'], {name:['row.continent']}) YIELD node
RETURN count(*)
This does not raise any error, but it does something different from what I was thinking of. It basically sets the column name ("row.continent") itself as the label...

The problem is that you surround the variable in apostrophes, so try this:
CALL apoc.load.csv('file:///city_data.csv') YIELD row
CALL apoc.create.node([row.continent], {name: row.continent}) YIELD node
RETURN count(*)

Related

Nested FutureBuilder vs nested calls for lazy loading from database

I need to choose best approach between two approaches that I can follow.
I have a Flutter app that use sqflite to save data, inside the database I have two tables:
Employee:
+-------------+-----------------+------+
| employee_id | employee_name |dep_id|
+-------------+-----------------+------+
| e12 | Ada Lovelace | dep1 |
+-------------+-----------------+------+
| e22 | Albert Einstein | dep2 |
+-------------+-----------------+------+
| e82 | Grace Hopper | dep3 |
+-------------+-----------------+------+
SQL:
CREATE TABLE Employee(
employee_id TEXT NOT NULL PRIMARY KEY,
employee_name TEXT NOT NULL ,
dep_id TEXT,
FOREIGN KEY(dep_id) REFERENCES Department(dep_id)
ON DELETE SET NULL
);
Department:
+--------+-----------+-------+
| dep_id | dep_title |dep_num|
+--------+-----------+-------+
| dep1 | Math | dep1 |
+--------+-----------+-------+
| dep2 | Physics | dep2 |
+--------+-----------+-------+
| dep3 | Computer | dep3 |
+--------+-----------+-------+
SQL:
CREATE TABLE Department(
dep_id TEXT NOT NULL PRIMARY KEY,
dep_title TEXT NOT NULL ,
dep_num INTEGER,
);
I need to show a ListGrid of departments that are stored in the Employee table. I should look at Employee table and fetch department id from it, This is easy but after fetching that dep_id I need to make a card from those ids so I need information from Department table.
complete inforamtion for thoses id I had fetched from Emplyee table is inside Department table.
There are thousands of rows in each table.
I have a database helper class to connect to the database :
DbHelper is something like this:
Future<List<String>> getDepartmentIds() async{
'fetch all dep_id from Employee table'
}
Future<Department> getDepartment(String id) async{
'fetch Department from Department table for a specific id'
}
Future<List<Department>> getEmployeeDepartments() async{
'''1.fetch all dep_id from Employee table
2.for each id fetch Department records from Department table'''
var ids = await getDepartmentIds();
List<Departments> deps=[];
ids.forEach((map) async {
deps.add(await getDepartment(map['dep_id']));
});
}
There is two approaches:
First One:
Define a function in dbhelper that returns all dep_id from Employee table(getDepartmentIds and another function that returns a department object(model) for that specific id.(getDepartment)
Now I need two FutureBuilder inside each other, one for fetching ids and the other one for fetching department model.
second One:
Define a function that first fetch ids then inside that function each id is maped to department model.(getEmployeeDepartments)
So I need one FutureBuilder .
Which one is better??
should I let FutureBuilders handle it or I should put pressure on dbHelper to habdle it?
If I use the first approach then I have to(as far as I can imagine!) put the the second future call(the one that fetch Department Object(model) based on it's id(getDepartment)) on build function and it's recommended no to do so.
And the problem with second one is that it does a lot of nested call in dbHelper.
I used ListView.builder for performance.
I checked both with some data but couldn't figure out which one is better. I guess it depends both on flutter and sqlite(sqflite).
which one is better or is there any better approach?
Given that I don't see too much code on this example, I'll do a high-level answer on your questions.
Evaluate Approach One
Right off the bat this part sticks out: "returns all dep_id from Employee table"
I would say scratch that, since "return all" is typically never a good solution, especially since you mention your tables have a lot of rows.
Evaluate Approach Two
I'm not sure what the difference in performance this has compared to the first approach, seems also bad for the same reasons. I think this one just changes your UI logic a big is all.
Typical 'Endless' List Approach
You would do a query on the Employees table with a join to the Departments table.
You would implement Pagination on your UI and pass in your values to the query from step one.
At a basic level you'll need these variables: Take, Skip, HasMore
Take: The count # of items to request each query
Skip: The count # of items to skip on the next query, this will be the size of the number of items you currently have in your List in memory driving your UI.
HasMore: You can set this on the response of each query, to let the UI know if there are still more items or not.
As you scroll down the list, when you get to the bottom, you will request more items .
Initially issue a query for example: Take: 10, Skip: 0
Next query when you hit the bottom of the UI: Take: 10, Skip: 10
etc..
Example sql query:
SELECT *
FROM Employees E
JOIN Departments D on D.id = E.dept_id
order by E.employee_name
offset {SKIP#} rows
FETCH NEXT {TAKE#} rows only
Hopefully, this helps, I'm not fully sure what you're trying to do actually - in terms of Code.
As far as I can tell, what you're looking to do is get a list of employees with relevant info including department.
If that's the case, then it's tailor made for INNER JOIN. Something like this:
SELECT Employee.*, Department.dep_id, Department.dep_title
FROM Employee INNER JOIN Department
ON Employee.dep_id = Department.dep_id;
(although you may want to double check that, my SQL is a bit rusty).
This would do what you need in one step. However, there is still the issue of what you're asking which seems to be "Is it more efficient to do many small requests or one big one, and what are the performance ramifications".
The answer to that is a bit specific to Flutter. What's happening when you do a request with SQFLITE, is that it is processing whatever you've passed to it, sending it to java/objc and possibly doing more processing and pushes processing to a backround thread, which then calls to the SQLITE library which does more processing to understand the request, then actually reads the data on the disk to do the operation, then returns back to the java/objc layer, which pushes the response to the UI thread, which in turns responds back to dart.
If that doesn't sound particularly efficient, that's because it isn't =D. If you're doing this a few times (or even a few hundred) it's probably fine, but if you're getting into thousands as you state it might start slowing down.
The alternative you've proposed is to do one large request. You will know better than I whether that is wise; if it's a couple thousand but only ever a couple thousand, and the data you're returning is always going to be relatively small (i.e. just a 10-20 character name and department name), then you'll have say (20+20)*2000 = 8000b = 80kb of data. Even if you assume the overhead will double that size, 160 kb of data shouldn't be enough to faze any relatively recent smartphone (after all that's much smaller than any single photo!).
Now, taking some domain specific knowledge, you could optimize this. For example, if you know the number of departments is much smaller than employees (i.e. < 100 or something), you could skip the entire issue of doing joins, and simply request all departments before this begins and put it in a map (dep_id => dep_title), and then once you've requested employees you could just simply do that lookup from dep_id to dep_title yourself. That way your requests wouldn't have to include the dep_title over and over again.
That being said, you may want to consider paging the employee lookup whether or not you use a join. You'd do this by requesting 100 employees (or whatever number) at a time rather than the entire batch - that way you don't have the overhead of 1000+ calls through the stack, but you also don't have a large block of data all in memory all at once.
SELECT * FROM Employee
WHERE employee_name >= LastValue
ORDER BY employee_name
LIMIT 100;
Unfortunately that doesn't fit in as well with how flutter does lists, so you'd probably need to have something like a 'EmployeeDatabaseManager' that does the actual requests, and your list would call into it to get the data. That's probably beyond the scope of this question though.

Work around the SQLite parameter limit in a select query

I have a GUI application with a list of people which contains the person's database id and their attributes. Something like this:
+----+------+
| ID | Name |
+----+------+
| 1 | John |
| 2 | Fred |
| 3 | Mary |
[...]
This list can be filtered, so the amount and type of people depend from time to time. To get a list of Peewee Person objects I first get the list of visible IDs and use the following query:
ids = [row[0] for row in store]
Person.select().where(Person.id.in_(ids))
Which in turn translates to the following SQL:
('SELECT "t1"."id", "t1"."name" FROM "person" AS "t1" WHERE ("t1"."id" IN (?, ?, ?, ...))', [1, 2, 3, ...])
This throws an OperationalError: too many SQL variables error on Windows with more than 1000 people. This is documented in the Peewee and SQLite docs. Workarounds given online usually relate to bulk inserts and ways to split the action in chunks. Is there any way to work around this limitation with the mentioned SELECT ... WHERE ... IN query?
Getting the separate objects in a list comprehension is too slow:
people = [Person.get_by_id(row[0]) for row in store]
Maybe split the list of IDs in max 1000 items, use the select query on each chunk and then combine those somehow?
Where are the IDs coming from? The best answer is to avoid using that many parameters, of course. For example, if your list of IDs could be represented as a query of some sort, then you can just write a subquery, e.g.
my_friends = (Relationship
.select(Relationship.to_user)
.where(Relationship.from_user == me))
tweets_by_friends = Tweet.select().where(Tweet.user.in_(my_friends))
In the above, we could get all the user IDs from the first query and pass them en-masse as a list into the second query. But since the first query ("all my friends") is itself a query, we can just compose them. You could also use a JOIN instead of a subquery, but hopefully you get the point.
If this is not possible and you seriously have a list of >1000 IDs...how is such a list useful in a GUI application? Over 1000 anything is quite a lot of things.
To try and answer the question you asked -- you'll have to chunk them up. Which is fine. Just:
user_ids = list_of_user_ids
accum = []
# 100 at a time.
for i in range(0, user_ids, 100):
query = User.select().where(User.id.in_(user_ids[i:i+100]))
accum.extend([user for user in query])
return accum
But seriously, I think there's a problem with the way you're implementing this that makes it even necessary to filter on so many ids.

Query datastore for the set of property values present

I have a property column which can have a subset of the following values at any point in time: { a | b | c | d | e }. By this, I mean that sometimes it can be any of { a | d | e }, or at another time it can even be { x | y | z }. How do I query the datastore, so that I can find out what subset is present at that point in time, without having to dig into each entity?
Presently I'm doing it this way:
people = Person.all().fetch(100)
city = set()
for p in people:
city.add(p.address)
I want to get the set of property values that are present at this point in time (i.e. no duplicates). For example, at one point in time all 5,000,000 people have an address of { Manila | Cebu | Davao }, then I want the set(Manila, Cebu, Davao).
At another point in time, all 5,000,000 people will have an address of { Iloilo | Laoag }, then I want the set(Iloilo, Laoag).
Before any query, I would not know what the set should compose of.
My present method requires that I dig through all the entities. It's terribly inefficient, any better way?
In AppEngine, it's almost always better to generate and store what you might need, during write time.
So in your use case, every time you add or edit a person entity, you add the city they are in to another model that lists all the cities, and then store that cities entity as well.
class Cities(db.Model):
list_of_cities = db.TextProperty(default="[]") #we'll use a stringified json list of cities
#when creating a new person / or when editing
person = Person(city = city)
cities = Cities.all().get() #there's only one model that we'll use.
list_of_cities = simplejson.loads(cities.list_of_cities)
if city not in list_of_cities:
list_of_cities.append(city) #add to the list of cities
cities.list_of_cities = simplejson.dumps(list_of_cities)
db.put(cities)
person.put()
You may want to use memcache on your cities entity to speed things up a bit. If you are also expecting to add more than one person in bursts of more than 1 write / second, then you might need to also consider sharding your list of cities.
An alternative to the approach suggested by Albert is to compute these values periodically using a mapreduce. The App Engine Mapreduce library makes this fairly straightforward. Your mapper will output the city (for instance) for each record, while the reducer will output the value and the number of times it occurs for each.

How to merge two Excel sheets

I have an Excel document with 10000 rows of data in two sheets, the thing is one of these sheets have the product costs, and the other has category and other information. These two are imported automatically from the sql server so I don't want to move it to Access but still I want to link the product codes so that when I merge the product tables as product name and cost on the same table, I can be sure that I'm getting the right information.
For example:
Code | name | category
------------------------------
1 | mouse | OEM
4 | keyboard | OEM
2 | monitor | screen
Code | cost |
------------------------------
1 | 123 |
4 | 1234 |
2 | 1232 |
7 | 587 |
Let's say my two sheets have tables like these, as you can see the next one has one that doesn't exist on the other- I put it there because in reality one has a few more, preventing a perfect match. Therefore I couldn't just sort both tables to A-Z and get the costs that way- as I said there are more than 10000 products in that database and I wouldn't want to risk a slight shift of costs -with those extra entries on the other table- that would ruin the whole table.
So what would be a good solution to get the entry from another sheet and inserting it to the right row when merging? Linking two tables with field name??... checking field and trying to match it with the other sheet??... Anything at all.
Note: When I use Access I would make relationships and when I would run a query it would match them automatically... I was wondering if there's a way to do that in excel too.
Why not use a vlookup? If there is a match, it will list the cost. Assuming the top is sheet1 and the other sheet2 and they both start on cell A1. You just need this in cell D2.
=VLOOKUP(A2,Sheet2!A:B,2,0)
You can then drag it down. Easiest way to fill all your 10000 rows is to hover over the bottom left corner of the cell with your cursor. It will turn from a white plus sign into a thin black one. Then simply double click.
Just use VLOOKUP - you can add a row to your first sheet, and find the cost based on code in the other sheet.

Split string returning a record

I have a single column table which looks like this:
Gaming | Austria | 47.9333, 15.1
Hebei | China | 39.8897, 115.275
This means that every row is a single string (VARCHAR) that contains some location, with the different fields separated by a pipe (|).
I would like to write a query that returns the following:
------- ---------- ---------------------------------
Gaming Austria Gaming | Austria | 47.9333, 15.1
Hebei China Hebei | China | 39.8897, 115.275
Which means I want 3 columns: for the city, for the country, and the original column.
While splitting the city is easy (combination of CHARINDEX and SUBSTRING), extracting the country seems to be more challenging. The tricky part is to know the length of the country field in the string, so it could be extracted using SUBSTRING.
I realize I might have to write a SPLIT function in T-SQL, but I'm not sure how to write one that returns the data as a record and not as a table.
Hints and/or solutions will be more than welcome.
Just a matter of specifying the appropriate starting position and dynamically calculating the length based on the positions of the delimiters within the string - as d. shows above with some additional tweaking:
select substring(fieldName,0,charindex('|',fieldName,0)) as city,
substring(fieldName,charindex('|',fieldName,0)+1,( charindex('|',fieldName,(charindex('|',fieldName,0)+1)) - charindex('|',fieldName,0) - 1)) as country,
right(fieldName, charindex('|',reverse(fieldName),0)-1) as coordinates
Note that you may want to compare this with a CLR-based split function, as well as a variety of other possibilities which are outlined in this other serverfault thread.
You can pass a 3rd parameter, which specifies the first character to start looking from. That means you can extract country like this:
CHARINDEX(field, '|', CHARINDEX(field, '|')+1)

Resources