I will start a project that needs a web and desktop interface. One solution seems to be IdeaBlade (http://www.ideablade.com).
Can anyone who uses it describe its limitations and advantages? Is it testable?
Thanks,
Alex
As VP of Technology at IdeaBlade it is not for me to comment generally on the DevForce limitations and advantages in this space. Happy to respond to specific questions though.
Is it testable? To this I can respond with the beginnings of an answer.
It's a potentially contentious question. People have strong feelings about what makes something testable. Let me confine myself to specific testing scenarios .. and then you judge the degree to which we meet your testing requirements.
1) DevForce supports pure POCO entities if that's your preference. Most people will prefer to use the entities that derive from our base Entity class so I will confine my subsequent remarks entirely to such entities.
2) You can new-up such an entity using any ctor you please and get and set its (non-navigation) properties with no other setup.
var cust = new Customer {ID=..., Name =...}; // have fun
Assembly references are required of course.
3) To test its navigation properties (properties that return other entities), you first new an EntityManager (our Unit-of-Work, context-like container), add or attach the entities to the EM, and off you go. Navigation properties of the Entities inheriting from our base class expect to find related entities through that container.
4) In most automated tests, the EntityManager will be created in a disconnected state so that it never attempts to reach a server or database.
You might add to it an Order, a Customer, some OrderDetails; note that all of them are constructed within the context of your tests ... not retrieved from anywhere.
Now order.Customer returns the test Customer; order.OrderDetails returns your test details. Your preparation consists of creating the EM, the test entities, ensuring that these entities have unique IDs and are associated.
Here's an example sequence:
var mgr = new EntityManager(false); // create disconnected
var order = new Order {ID = ..., Quantity = 1, ...};
var customer = new Customer {ID = 42, Name = "ABC", };
mgr.AttachEntity(order);
mgr.AttachEntity(customer);
order.Customer = customer; // associate them
The EM is acting as an in-memory database.
5) You can use LINQ now
var custs = mgr.Customers.Where(c => c.Name.StartsWith("A").ToList();
var orders = mgr.Orders.Where(o => o.Customer.Name.StartsWith("A")).ToList();
6) Of course I always create a new EntityManager before each test to eliminate cross-test pollution.
7) I often write a so-called "Data Mother" test helper class to populate an EM with a standard collection of test data, including deviant cases.
8) I can export an EntityManager's cache of test entities to file or a test project resource. When tests run, a DataMother can retrieve and restore these test entities.
Observe that I am moving progressively away from unit testing and toward integration testing. But (so far) my tests do not require access to a server, or Entity Framework, or the database. They run fast and they are less vulnerable to distracting setup failures.
Of course you can get to the server in deep integration tests and you can easily switch servers and databases dynamically for local, LAN, and web scenarios.
9) You can intercept query, save, change, add, remove, and other events for interaction testing.
10) Everything I've described works in both regular .NET and Silverlight and with every test framework I've encountered.
On the downside, I wouldn't describe our product as mock-friendly.
I readily concede that we are not Persistence Ignorant (PI). If you're a PI fanatic, we're the wrong choice for you.
We try to appreciate the important benefits of PI and do our best to realize them in our product. We do what we can to shunt framework concerns out of view. Still, as you see, our abstraction leaks in a few places. For example, we add these members to the public API of your entities:
EntityAspect (the gateway to persistence awareness)
ErrorsChanged
PendingEntityResolved
PropertyChanged
ToQuery<>
Personally, I would have cut this to two (EntityAspect, PropertyChanged); the others snuck by me. For what it's worth, inheriting from Object (as you must) contributes another extraneous five.
We feel that we've made good trade-offs between pure P.I. and ease-of-development.
My question is "does it give you what you need to validate expectations without a lot of friction ... along the entire spectrum from unit to deep integration testing?"
I'm certainly curious to learn how you obtain similar facility with less friction with similar products. And eager to take suggestions on how we can improve our support for application testing.
Feel free to follow-up with questions about other testing scenarios that I may have overlooked.
Hope this helps
Ward
Related
i've this model
class Team(ndb.Model):
name = ndb.StringProperty()
password = ndb.StringProperty()
email = ndb.StringProperty()
class Offer(ndb.Model):
team = ndb.KeyProperty(kind=Team)
cut = ndb.StringProperty()
price = ndb.IntegerProperty()
class Call(ndb.Model):
name = ndb.StringProperty()
called_by = ndb.KeyProperty(kind=Team)
offers = ndb.KeyProperty(kind=Offer, repeated=True)
status = ndb.StringProperty(choices=['OPEN', 'CLOSED'], default="OPEN")
dt = ndb.DateTimeProperty(auto_now_add=True)
i've this view
class MainHandler(webapp2.RequestHandler):
def get(self):
calls_open = Call.query(Call.status == "OPEN").fetch()
calls_past = Call.query(Call.status == "CLOSED").fetch()
template_values = dict(open=calls_open, past=calls_past)
template = JINJA_ENVIRONMENT.get_template('templates/index.html')
self.response.write(template.render(template_values))
and this small test tempalte
{% for call in open %}
<b>{{call.name}} {{call.called_by.get().name}}</b>
{% endfor %}
now, with the get() it works perfectly.
my question is: is this correct?
is there a better way to do it?
personally i found it strange to get() the values in the template and i would prefer to fetch it inside the view.
my idea was to:
create a new list res_open_calls=[]
for all the call in calls_open call the to_dict() dict_call = call.to_dict()
then assign to the dict_call dict_call['team'] = call.team.get().to_dict()
add the object to the list res_open_calls.append(dict_call)
then return this just generated list.
this is the gist i wrote ( for a modified code) https://gist.github.com/esseti/0dc0f774e1155ac63797#file-call_offers_calls
it seems more clean but a bit more expensive (a second list has to be generated). is there something better/clever to do?
The OP is clearly showing code very different from the one they're using: they show called_by as a StringProperty so calling get on it should crash, they talk about a call.team that doesn't exist in the code they show... anyway, I'm trying to guess what they actually have, because I find the underlying idea is important.
The OP, IMHO, is correct to be uncomfortable about having DB operations right in a Jinjia2 template, which would be best limited to presentation-level issues. I'll assume (guess!) that part of the Call model is:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
and the relevant part of the Jinja2, currently working for the OP, is:
{{{{call.team.get().name}}
A better structure might then be:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
#property
def team_name(self):
return self.team.get().name
and in the template just {{call.teamname}}.
This still performs the DB operation during template expansion, but it does so on the Python code side of things, rather than the Jinja2 side of things -- better than embodying so much detail about the model's data architecture in a template that should focus on presentation only.
Alternatively, if a Call instance is .put rarely and displayed often, and its team does not change name, one could, so to speak, cache the value in a ComputedProperty:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
def _team_name(self):
return self.team.get().name
team_name = ComputedProperty(self._team_name)
However, this latter choice is inferior (as it involves more storage space, does not save execution time, and complicates actual interactions with the datastore) unless some queries for Call entities also need to query on team_name (in which latter case it would be a must).
If one did chose this alternative, the Jinjia2 template would still use {{call.teamname}}: this hints at why it's best to use in templates only logic strictly connected to presentation -- it leaves more degrees of freedom for implementing attributes and properties on the Python code side of things, without needing to change the templates. "Separation of concerns" is an excellent principle in programming.
The snippet posted elsewhere suggests a higher degree of complication, where Call is indeed as shown but then of course there is no call.team as shown repeatedly in the question -- rather, a double indirection via call.offers and each offer.team. This makes sense in terms of entity-relationship modeling but can be heavy-going to implement in the essentially "normalized" terms the snippet suggests in any NoSQL database, including GAE's datastore.
If teams don't change names, and calls don't change their list of offers, it might show better performance to denormalize the model (storing in Call the technically redundant information that, in the snippet, is fetched by running through the double indirection) -- e.g by structured properties, https://cloud.google.com/appengine/docs/python/ndb/properties#structured , to embed copies of the Offer objects in Call entities, and a copy of the Team object (or even just the team's name) in the Offer entity.
Like all de-normalizing, this can take a few extra bytes per entity in the datastore, but nevertheless could amply pay for it by minimizing the number of datastore accesses needed at fetch time, depending on the pattern of accesses to the various entities and properties.
However, by now we're straying far away from the question, which is about what to put in the template, what on the Python side. Optimizing datastore patterns is a separate issue well worth of Qs of its own.
Summarizing my stance on the latter, core issue of Python code vs template as residence for logic: data-access logic should be on the Python code side, ideally embedded in Model classes (using property for just-in-time access, possibly all the way to denormalization at entity-building or perhaps at entity-finalization time); Jinjia2 templates (or any other kind of pure presentation layer) should only have logic directly needed for presentation, not for data access (nor business logic either of course).
I am writing a datastore migration for our current production App Engine application.
We made some fairly extensive changes to the data model so I am trying to put in place an architecture to allow easier migrations in the future. This includes test suites for the migrations and common class structures for migration scripts.
I am running into a problem with my current strategy. For both the migrations and the test scripts I need a way to load the Model classes from the old schema and the Model classes for the new data schema into memory at the same time and load entities using either.
Here is an example set of schemas.
rev1.py
class Account(db.Model):
_version = db.IntegerProperty(default = 1)
user = db.UserProperty(auto_current_user_add = True, required = True)
name = db.StringProperty()
contact_email = db.EmailProperty()
rev2.py
class Account(db.Model):
_version = db.IntegerProperty(default = 2)
auth_id = db.StringProperty()
name = db.StringProperty()
pwd_hash = db.StringProperty(required = True, indexed = False)
A migration script may look something like:
import rev1
import rev2
class MyMigration(...):
def isNeeded(self):
num_accounts = num_entities_with_version(rev1.Account, 1)
return num_accounts > 0
def run(self):
rev1_accounts = rev1.Account.all()
for account in [a for a in rev1_accounts if account._version == 1]:
auth_id = account.contact_email
if auth_id is None or auth_id == '':
auth_id = account.user.email()
new_account = rev2.Account.create(auth_id = auth_id,
name = account.name)
And a test suite would look something like this:
import rev1
import rev2
class MyTest(...):
def testIt(self):
# Setup data
act1 = rev1.Account(name = '..', contact_email = '..')
act1.put()
act2 = rev1.Account(name = '..', contact_email = '..')
act2.put()
# Run migration
migration.run()
# Check results
accounts = rev2.Account.all().fetch(99)
So as you can see I am using the old revision in two ways. I am using it in the migration as a way to read data in the old format and convert it into the new format. (note: I can't read it in the new format because of things like the required pwd_hash field and other field changes). I am using it in the test suite to setup test data in the old format before running the migration.
It all seems great in theory, but in practice it falls apart because GAE doesn't allow multiple models to be loaded for the same kind, or more specifically, queries only return for the most recently defined model.
In the development server this seems to be due to the fact that the process of calling get() on a query for an entity (ex: Account.get(my_key)) calls a result hook that builds the result Model object by calling class_for_kind on the entity kind name from the data. So even though I may call rev2.Account.get(), it may build up rev1.Account Model objects because the kind 'Account' maps to rev1.Account in the _kind_map dictionary.
This has made me rethink my migration strategy a bit and I wanted to ask if anyone has thoughts. Specifically:
Would it be safe to manually override google.appengine.ext.db._kind_map at runtime in test and on the production servers to allow this migration method to work?
Is there some better way to keep two versions of a Model in memory at the same time?
Is there a different migration method that may be a smarter way to go about this work?
Other methods I have thought of trying include:
Change the entity kind when the version changes. (use kind() to change it) Then when we migrate we move all classes to the new kind name.
Find a way to query the entities and get back a 'raw' object (proto buffers??) that has not been built into a full object. (would not work with tests)
'Just Do It Live': Don't write tests for any of this and just try to migrate using the latest schema loading the older data working around issues as the come up.
I think there are actually several questions within the greater question. There seem to be two key questions here though, one is how to test and the other is how to really do it.
I wouldn't define the kind multiple times; as you've noted there are nuances to doing this, and, if you wind up with the wrong model loaded, you'll get all sorts of headaches. That said, it is completely possible for you to manipulate the kind_map. I've done this in some special cases, but I try to avoid it when possible.
For a live migration where you've got significant schema changes, you've got two choices: use Expando or use the lower level API. When adding required fields, you might find it easier to use Expando, then run a migration to add the new information, then switch back to a plain db.Model. The lower-level API sits right under the ext.db stuff, and it presents the entity as a Python dict. This can be very convenient for manipulating an entity. Use whichever method you're more comfortable with. I prefer Expando when posible, since it is a higher level interface, but it is a two-step process.
For testing, I'd personally suggest you focus on the actual conversion routines. So instead of testing the method from the point of querying down, test to ensure your conversion routines themselves function correctly. You might even choose to pass in the old entity as a Python dict, then return the new entity.
I'd make one other adjustment here as well. I'd rather use a query to find all my rev 1 accounts. That's the great thing about having an indexed _version on your models. You can trivially find things that need migrated.
Also, check out Google's article on updating schemas. It is old, but still good.
Another approach is to simply do the migration on version 2, leaving the old attributes on the model and setting them to None after you update the version. This will clear out the space they use but will still leave them defined. Then in a following release you can just remove them from the model.
This method is pretty simple, but does require two releases to remove old attribute completely, so is more akin to deprecating the existing attributes.
I'm working on a personal project using WPF with Entity Framework and Self Tracking Entities. I have a WCF web service which exposes some methods for the CRUD operations. Today I decided to do some tests and to see what actually travels over this service and even though I expected something like this, I got really disappointed. The problem is that for a simple update (or delete) operation for just one object - lets say Category I send to the server the whole object graph, including all of its parent categories, their items, child categories and their items, etc. I my case it was a 170 KB xml file on a really small database (2 main categories and about 20 total and about 60 items). I can't imagine what will happen if I have a really big database.
I tried to google for some articles concerning traffic optimization with STE, but with no success, so I decided to ask here if somebody has done something similar, knows some good practices, etc.
One of the possible ways I came out with is to get the data I need per object with more service calls:
return context.Categories.ToList();//only the categories
...
return context.Items.ToList();//only the items
Instead of:
return context.Categories.Include("Items").ToList();
This way the categories and the items will be separated and when making changes or deleting some objects the data sent over the wire will be less.
Has any of you faced a similar problem and how did you solve it or did you solve it?
We've encountered similiar challenges. First of all, as you already mentioned, is to keep the entities as small as possible (as dictated by the desired client functionality). And second, when sending entities back over the wire to be persisted: strip all navigation properties (nested objects) when they haven't changed. This sounds very simple but is not at all trivial. What we do is to recursively dig into the entities present in trackable collections of say the "topmost" entity (and their trackable collections, and theirs, and...) and remove them when their ChangeTracking state is "Unchanged". But be carefull with this, because in some cases you still need these entities because they have been removed or added to trackable collections of their parent entity (so then you shouldn't remove them).
This, what we call "StripEntity", is also mentioned (not with any code sample or whatsoever) in Julie Lerman's - Programming Entity Framework.
And although it might not be as efficient as a more purist kind of approach, the use of STE's saves a lot of code for queries against the database. We are not in need for optimal performance in a high traffic situation, so STE's suit our needs and takes away a lot of code to communicate with the database. You have to decide for your situation what the "best" solution is. Good luck!
You can find an Entity Framework project item at http://selftrackingentity.codeplex.com/. With version 0.9.8, I added a method called GetObjectGraphChanges() that returns an optimized entity object graph with only objects that have changes.
Also, there are two helper methods: EstimateObjectGraphSize() and EstimateObjectGraphChangeSize(). The first method returns the estimate size of the whole entity object along with its object graph; and the later returns the estimate size of the optimized entity object graph with only object that have changes. With these two helper methods, you can decide whether it makes sense to call GetObjectGraphChanges() or not.
The buzword in Linq now days is "unit of work".
as in "Only keep your datacontext in existence for one unit of work" then destroy it.
Well I have a few questions about that.
I am creating a fat client WPF
application. So my data context needs to track the entire web of instantiated object available for the user on the current screen. when can I destroy my datacontext?
I build a linq query over time based on actions of the user and their interactions with objects of the first datacontext. How can I create a new DataContext and execute the query on new Context?
I hope I was clear.
Thank you
Unit of Work is not the same as only keep your datacontext in existence for one unit of work.
Unit of Work is a design pattern that describe how to represent transactions in an abstract way. It's really only required for Create, Update and Delete (CUD) operations.
One philosophy is that UoW is used for all CUD operations, while read-only Repositories are used for read operations.
In any case I would recommend decoupling object lifetime from UoW or Repository usage. Use Dependency Injection (DI) to inject both into your consuming services, and let the DI Container manage lifetimes of both.
In web applications, my experience is that the object context should only be kept alive for a single request (per-request lifetime). On the other hand, for a rich client like the one you describe, keeping it alive for a long time may be more efficient (singleton lifetime).
By letting a DI Container manage the lifetime of the object context, you don't tie yourself to one specific choice.
I am creating a fat client WPF application.
Ok.
So my data context needs to track the entire web of instantiated object available for the user on the current screen.
No. Those classes are database mapping classes. They are not UI presentation classes.
How can I create a new DataContext and execute the query on new Context?
Func<DataContext, IQueryable<Customer>> queryWithoutADataContext =
dc =>
from cust in dc.Customers
where cust.name == "Bob"
select cust;
Func<DataContext, IQueryable<Customer>> moreFiltered =
dc =>
from cust in queryWithoutADataContext(dc)
where cust.IsTall
select cust;
var bobs = queryWithoutADataContext(new DataContext);
var tallbobs = moreFiltered(new DataContext);
I've heard that unit testing is "totally awesome", "really cool" and "all manner of good things" but 70% or more of my files involve database access (some read and some write) and I'm not sure how to write a unit test for these files.
I'm using PHP and Python but I think it's a question that applies to most/all languages that use database access.
I would suggest mocking out your calls to the database. Mocks are basically objects that look like the object you are trying to call a method on, in the sense that they have the same properties, methods, etc. available to caller. But instead of performing whatever action they are programmed to do when a particular method is called, it skips that altogether, and just returns a result. That result is typically defined by you ahead of time.
In order to set up your objects for mocking, you probably need to use some sort of inversion of control/ dependency injection pattern, as in the following pseudo-code:
class Bar
{
private FooDataProvider _dataProvider;
public instantiate(FooDataProvider dataProvider) {
_dataProvider = dataProvider;
}
public getAllFoos() {
// instead of calling Foo.GetAll() here, we are introducing an extra layer of abstraction
return _dataProvider.GetAllFoos();
}
}
class FooDataProvider
{
public Foo[] GetAllFoos() {
return Foo.GetAll();
}
}
Now in your unit test, you create a mock of FooDataProvider, which allows you to call the method GetAllFoos without having to actually hit the database.
class BarTests
{
public TestGetAllFoos() {
// here we set up our mock FooDataProvider
mockRepository = MockingFramework.new()
mockFooDataProvider = mockRepository.CreateMockOfType(FooDataProvider);
// create a new array of Foo objects
testFooArray = new Foo[] {Foo.new(), Foo.new(), Foo.new()}
// the next statement will cause testFooArray to be returned every time we call FooDAtaProvider.GetAllFoos,
// instead of calling to the database and returning whatever is in there
// ExpectCallTo and Returns are methods provided by our imaginary mocking framework
ExpectCallTo(mockFooDataProvider.GetAllFoos).Returns(testFooArray)
// now begins our actual unit test
testBar = new Bar(mockFooDataProvider)
baz = testBar.GetAllFoos()
// baz should now equal the testFooArray object we created earlier
Assert.AreEqual(3, baz.length)
}
}
A common mocking scenario, in a nutshell. Of course you will still probably want to unit test your actual database calls too, for which you will need to hit the database.
Ideally, your objects should be persistent ignorant. For instance, you should have a "data access layer", that you would make requests to, that would return objects. This way, you can leave that part out of your unit tests, or test them in isolation.
If your objects are tightly coupled to your data layer, it is difficult to do proper unit testing. The first part of unit test, is "unit". All units should be able to be tested in isolation.
In my C# projects, I use NHibernate with a completely separate Data layer. My objects live in the core domain model and are accessed from my application layer. The application layer talks to both the data layer and the domain model layer.
The application layer is also sometimes called the "Business Layer".
If you are using PHP, create a specific set of classes ONLY for data access. Make sure your objects have no idea how they are persisted and wire up the two in your application classes.
Another option would be to use mocking/stubs.
The easiest way to unit test an object with database access is using transaction scopes.
For example:
[Test]
[ExpectedException(typeof(NotFoundException))]
public void DeleteAttendee() {
using(TransactionScope scope = new TransactionScope()) {
Attendee anAttendee = Attendee.Get(3);
anAttendee.Delete();
anAttendee.Save();
//Try reloading. Instance should have been deleted.
Attendee deletedAttendee = Attendee.Get(3);
}
}
This will revert back the state of the database, basically like a transaction rollback so you can run the test as many times as you want without any sideeffects. We've used this approach successfully in large projects. Our build does take a little long to run (15 minutes), but it is not horrible for having 1800 unit tests. Also, if build time is a concern, you can change the build process to have multiple builds, one for building src, another that fires up afterwards that handles unit tests, code analysis, packaging, etc...
I can perhaps give you a taste of our experience when we began looking at unit testing our middle-tier process that included a ton of "business logic" sql operations.
We first created an abstraction layer that allowed us to "slot in" any reasonable database connection (in our case, we simply supported a single ODBC-type connection).
Once this was in place, we were then able to do something like this in our code (we work in C++, but I'm sure you get the idea):
GetDatabase().ExecuteSQL( "INSERT INTO foo ( blah, blah )" )
At normal run time, GetDatabase() would return an object that fed all our sql (including queries), via ODBC directly to the database.
We then started looking at in-memory databases - the best by a long way seems to be SQLite. (http://www.sqlite.org/index.html). It's remarkably simple to set up and use, and allowed us subclass and override GetDatabase() to forward sql to an in-memory database that was created and destroyed for every test performed.
We're still in the early stages of this, but it's looking good so far, however we do have to make sure we create any tables that are required and populate them with test data - however we've reduced the workload somewhat here by creating a generic set of helper functions that can do a lot of all this for us.
Overall, it has helped immensely with our TDD process, since making what seems like quite innocuous changes to fix certain bugs can have quite strange affects on other (difficult to detect) areas of your system - due to the very nature of sql/databases.
Obviously, our experiences have centred around a C++ development environment, however I'm sure you could perhaps get something similar working under PHP/Python.
Hope this helps.
You should mock the database access if you want to unit test your classes. After all, you don't want to test the database in a unit test. That would be an integration test.
Abstract the calls away and then insert a mock that just returns the expected data. If your classes don't do more than executing queries, it may not even be worth testing them, though...
The book xUnit Test Patterns describes some ways to handle unit-testing code that hits a database. I agree with the other people who are saying that you don't want to do this because it's slow, but you gotta do it sometime, IMO. Mocking out the db connection to test higher-level stuff is a good idea, but check out this book for suggestions about things you can do to interact with the actual database.
I usually try to break up my tests between testing the objects (and ORM, if any) and testing the db. I test the object-side of things by mocking the data access calls whereas I test the db side of things by testing the object interactions with the db which is, in my experience, usually fairly limited.
I used to get frustrated with writing unit tests until I start mocking the data access portion so I didn't have to create a test db or generate test data on the fly. By mocking the data you can generate it all at run time and be sure that your objects work properly with known inputs.
Options you have:
Write a script that will wipe out database before you start unit tests, then populate db with predefined set of data and run the tests. You can also do that before every test – it'll be slow, but less error prone.
Inject the database. (Example in pseudo-Java, but applies to all OO-languages)
class Database {
public Result query(String query) {... real db here ...}
}
class MockDatabase extends Database {
public Result query(String query) {
return "mock result";
}
}
class ObjectThatUsesDB {
public ObjectThatUsesDB(Database db) {
this.database = db;
}
}
now in production you use normal database and for all tests you just inject the mock database that you can create ad hoc.
Do not use DB at all throughout most of code (that's a bad practice anyway). Create a "database" object that instead of returning with results will return normal objects (i.e. will return User instead of a tuple {name: "marcin", password: "blah"}) write all your tests with ad hoc constructed real objects and write one big test that depends on a database that makes sure this conversion works OK.
Of course these approaches are not mutually exclusive and you can mix and match them as you need.
Unit testing your database access is easy enough if your project has high cohesion and loose coupling throughout. This way you can test only the things that each particular class does without having to test everything at once.
For example, if you unit test your user interface class the tests you write should only try to verify the logic inside the UI worked as expected, not the business logic or database action behind that function.
If you want to unit test the actual database access you will actually end up with more of an integration test, because you will be dependent on the network stack and your database server, but you can verify that your SQL code does what you asked it to do.
The hidden power of unit testing for me personally has been that it forces me to design my applications in a much better way than I might without them. This is because it really helped me break away from the "this function should do everything" mentality.
Sorry I don't have any specific code examples for PHP/Python, but if you want to see a .NET example I have a post that describes a technique I used to do this very same testing.
I agree with the first post - database access should be stripped away into a DAO layer that implements an interface. Then, you can test your logic against a stub implementation of the DAO layer.
You could use mocking frameworks to abstract out the database engine. I don't know if PHP/Python got some but for typed languages (C#, Java etc.) there are plenty of choices
It also depends on how you designed those database access code, because some design are easier to unit test than other like the earlier posts have mentioned.
I've never done this in PHP and I've never used Python, but what you want to do is mock out the calls to the database. To do that you can implement some IoC whether 3rd party tool or you manage it yourself, then you can implement some mock version of the database caller which is where you will control the outcome of that fake call.
A simple form of IoC can be performed just by coding to Interfaces. This requires some kind of object orientation going on in your code so it may not apply to what your doing (I say that since all I have to go on is your mention of PHP and Python)
Hope that's helpful, if nothing else you've got some terms to search on now.
Setting up test data for unit tests can be a challenge.
When it comes to Java, if you use Spring APIs for unit testing, you can control the transactions on a unit level. In other words, you can execute unit tests which involves database updates/inserts/deletes and rollback the changes. At the end of the execution you leave everything in the database as it was before you started the execution. To me, it is as good as it can get.