OpenAI GPT API pre-tokenizing? - artificial-intelligence

I am trying to make a "personal assistant" chatbot (using GPT AI API) that can answer questions about myself when others ask it things. In order to do so, I have to give it a lot of information about myself, which I am currently doing in the prompt.
Example:
This means that every time someone asks a question, the prompt includes all of the information about me, which means that it gets tokenized every single time a question is asked. Is there a way to "pre-tokenize" the information about myself or store it in some other way? I ask this because the information about myself is what is costing me the most, as it sucks up a lot of tokens.
Thanks
I have tried looking online to no avail.

There are a couple of ways around this.
You could use the Playground to summarise your informaiton, but this might lose some of the meaning. This is the simplest approach.
Probably the most effective approach would be to fine-tune your own model that already has the data. This is more time consuming.
https://platform.openai.com/docs/guides/fine-tuning
This is from the OpenAI website:
Fine-tuned models.
Create your own custom models by fine-tuning our base models with your training data. Once you fine-tune a model, you’ll be billed only for the tokens you use in requests to that model.

Related

How do you technically save how many people have visited a certain URL or a certain post?

I am trying to create a forum app using Django as a backend and React as a front end. I want to find out how many people have visited a post created by a user so that I can store as views and list the posts according to popularity.
I am just a student and I have no experience with live websites, so I'm wondering if it is okay to just save a user at componentdidmount life cycle? But I'm afraid it'd make the same user be counted as many as he visits and the post creator will be able to increase his post's popularity by just spamming his website.
I would suggest you implement this on the backend, not the front end. I don't know Django well, but there should be some way to know whether that particular post is getting requested. At that point, you'll want to increment the counter for that post.
The problem of course is determining which "views" count as real views. Was it the poster? Was it a robot? Was it a spider? Was it a scraper? Was it the same person who is not the poster viewing it many times?
I wouldn't say this last part is not an easy thing to implement, and would probably take some trial and error before finding the right conditions to get your metrics "right".
As #Mike suggests above, there are many analytics packages which use sophisticated algorithms to determine "realness", and you may be able to use this data. My understanding is that you want to actually apply the data to sorting and UI for your app, not just view it on the dashboard of your analytics tool. I've never tried to look for one that supports an API to discover what you're interested in programmatically, but they all probably allow you do download structured data about your traffic. The problem with the latter approach is that creates a delay and a manual step (always something to try to avoid imo).

How to get book metadata?

My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like Amazon.com, Chegg.com, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.
To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.
The options I have tried are:
Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues.
ISBNdb.com API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.
I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata.
Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from isbndb.com in a database and fill it up book by book.
Instead of scraping Amazon, you can use the API they expose for their affiliate program: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
It allows about 3k requests per hour and returns well-formed XML. It requires you to set a link to the book that you show the information about, and you must state that you are an affiliate partner.
This might be what you're looking for. They even offer a complete download!
https://openlibrary.org/data
As it seems, a lot of libraries and other organisations make information such as "ISBN" available through MAchine-Readable Cataloging aka MARC, you can find more information about it here as well.
Now knowing the "right" term to search for I discovered WorldCat.org.
Maybe this whole MARC thing gives you a new kind of an idea :)

Looking for an example of when screen scraping might be worthwhile

Screen scraping seems like a useful tool - you can go onto someone else's site and steal their data - how wonderful!
But I'm having a hard time with how useful this could be.
Most application data is pretty specific to that application even on the web. For example, let's say I scrape all of the questions and answers off of StackOverflow or all of the results off of Google (assuming this were possible) - I'm left with data that is not very useful unless I either have a competing question and answer site (in which case the stolen data will be immediately obvious) or a competing search engine (in which case, unless I have an algorithm of my own, my data is going to be stale pretty quickly).
So my question is, under what circumstances could the data from one app be useful to some external app? I'm looking for a practical example to illustrate the point.
It's useful when a site publicly provides data that is (still) not available as an XML service. I had a client who used scraping to pull flight tracking data into one of his company's intranet applications.
The technique is also used for research. I had a client who wanted to compare the contents of several online dictionaries by part of speech, and all of these sites had to be scraped.
It is not a technique for "stealing" data. All ordinary usage restrictions apply. Many sites implement CAPTCHA mechanisms to prevent scraping, and it is inappropriate to work around these.
A good example is StackOverflow - no need to scrape data as they've released it under a CC license. Already the community is crunching statistics and creating interesting graphs.
There's a whole bunch of popular mashup examples on ProgrammableWeb. You can even meet up with fellow mashupers (O_o) at events like BarCamps and Hack Days (take a sleeping bag). Have a look at the wealth of information available from Yahoo APIs (particularly Pipes) and see what developers are doing with it.
Don't steal and republish, build something even better with the data - new ways of understanding, searching or exploring it. Always cite your data sources and thank those who helped you. Use it to learn a new language or understand data or help promote the semantic web. Remember it's for fun not profit!
Hope that helps :)
If the site has data that would benefit from being accessible through an API (and it would be free and legal to do so), but they just haven't implemented one yet, screen scraping is a way of essentially creating that functionality for yourself.
Practical example -- screen scraping would allow you to create some sort of mashup that combines information from the entire SO family of sites, since there's currently no API.
Well, to collect data from a mainframe. That's one reason why some people use screen scraping. Mainframes are still in use in the financial world and often it's running software that has been written in the previous century. The people who wrote it might already be retired and since this software is very critical for these organizations, they really hate it when some new code needs to be added. So, screenscraping offers an easy interface to communicate with the mainframe to collect information from the mainframe and then send it onwards to any process that needs this information.
Rewrite the mainframe application, you say? Well, software on mainframes can be very old. I've seen software on mainframes that was over 30 years old, written in COBOL. Often, those applications work just fine and companies don't want to risk rewriting parts because it might break some code that had been working for over 30 years! Don't fix things if they're not broken, please. Of course, additional code could be written but it takes a long time for mainframe code to be used in a production environment. And experienced mainframe developers are hard to find.
I myself had to use screen scraping too in a software project. This was a scheduling application which had to capture the output to the console of every child process it started. It's the simplest form of screen scraping, actually, and many people don't even realize that if you redirect the output of one application to the input of another, that it's still a kind of screen scraping. :)
Basically, screen scraping allows you to connect one (web) application with another one. It's often a quick solution, used when other solutions would cost too much time. Everyone hates it, but the amount of time it saves still makes it very efficient.
Let's say you wanted to get scores from a popular sports site that did not offer the information available with an XML feed or API.
For one project we found a (cheap) commercial vendor that offered translation services for a specific file format. The vendor didn't offer an API (it was, after all, a cheap vendor) and instead had a web form to upload and download from.
With hundreds of files a day the only way to do this was to use WWW::Mechanize in Perl, screen scrape the way through the login and upload boxes, submit the file, and save the returned file. It's ugly and definitely fragile (if the vendor changes the site in the least it could break the app) but it works. It's been working now for over a year.
One example from my experience.
I needed a list of major cities throughout the world with their latitude and longitude for an iPhone app I was building. The app would use that data along with the geolocation feature on the iPhone to show which major city each user of the app was closest to (so as not to show exact location), and plot them on a 3D globe of the earth.
I couldn't find an appropriate list in XML/Excel/CSV type format anywhere easily, but I did find this wikipedia page with (roughly) the info I needed. So I wrote up a quick script to scrape that page and load the data into a database.
Any time you need a computer to read the data on a website. Screen scraping is useful in exactly the same instances that any website API is useful. Some websites, however, don't have the resources to create an API themselves; screen scraping is the developer's way around that.
For instance, in the earlier days of Stack Overflow, someone built a tool to track changes to your reputation over time, before Stack Overflow itself provided that feature. The only way to do that, since Stack Overflow has no API, was to screen scrape.
The obvious case is when a webservice doesn't offer reverse search. You can implement that reverse search over the same data set, but it requires scraping the entire dataset.
This may be fair use if the reverse search also requires significant pre-processing, e.g. because you need to support partial matching. The data source may not have the technical skills or computing resources to provide the reverse search option.
I use screen scraping daily, I run some eCommerce sites and have screen-scraping scripts running daily to gather product lists automatically from my suppliers wholesale sites. This allows me to have upto date information on all the products available to me from several suppliers and allows me to flag non-economical margins due to price changes.

designing database

Basically my job is to develop web applications using a database as backend. What I have been doing till now is,
Basded on the requirement of the client,
I draw a basic sketch of what the
tables are ,how they look like
fields in those tables and some one-to-one or many-to-one or many-to-many relations
Although I am not perfect at these things, I try to figure out how the relations should be from my past projects that I worked on. But there are still some doubts about this in my mind.
If the client asks that he wants a particular data, I try to achieve it either through a direct SQL query or thought the scritp (in most cases PHP), if I am unable to figure out a query at all for that particular request.
Now, here comes my question.
Based on the relationships that I
figured out while developing tables,
are there any limitations to what a
client can ask? What I mean to say by
this is, the client will ask that he
wants list all the indidual
products, their counts, associated
category, all the counts of
category, the products in each
category and the their prices, sum of
all the category prices and the
total prices so on so forth.
This is just an example of one request to explain my situation.
Now, if there is any request that can potentially take longer time for the exection, can the developer satisfy this request by breaking down the request?
Do I need to tell him why is this break down necessary?
What if he feels that I am not capable of doing it in a single shot?
Is every report that he asks for need to be in single query? or will there be any need to itake the help of PHP to proces one loop and based on the values that I get, I put some conditions to apply rules that the client wants?
What is the better way to do this kind of job?
Any views?
Thanks.
This will generally depend on the Database used.
Most queries could be done in a single select, but this shoudl never stop you from looking at Views/Sub Selects/ Stored Procedures.
You should be able to handle most of your queries in this fashion, so I would recomend:
Dont let the output determine how you design the database, this might lead you down the wrong road. You need to stored data in the most normalized fasion suitable to the application.
Lots of questions!
Based on the relationships that I
figured out while developing tables,
are there any limitations to what a
client can ask?
A client can ask for anything really. Clients aren't always rationale. It's part of your job to help the client think through their needs.
What I mean to say by this is, the
client will ask that he wants list all
the indidual products, their counts,
associated category, all the counts of
category, the products in each
category and the their prices, sum of
all the category prices and the total
prices so on so forth.
All of these queries sound possible with SQL. To list individual products use the SELECT statement. To get a count use COUNT. To get associated categories use JOINS. Use SUM to get total prices.
Now, if there is any request that can
potentially take longer time for the
exection, can the developer satisfy
this request by breaking down the
request? Do I need to tell him why is
this break down necessary?
Yes - breaking down the request can help a client understand their needs.
What if he
feels that I am not capable of doing
it in a single shot?
Convince him otherwise. You don't want him thinking you're stupid if you want to keep his business. :)
Is every report that he asks for need
to be in single query? or will there
be any need to itake the help of PHP
to proces one loop and based on the
values that I get, I put some
conditions to apply rules that the
client wants?
Really depends on your skill level. If you know SQL well enough you can get most of your data in one query. If you aren't as good then you might do a few queries and then loop of them in php. Typically it is faster to do it all in SQL.
What is the better way to do this kind of job?
Are you working for yourself? If so, sometimes it just takes experience to figure out the best way. (and posting to stackoverflow :)
You should look at the general principles of requirement specification and represent your client's needs, for example as user stories, which are tasks that a user will wish to perform. You can then cost each user story as a unit of work. It should be possible to work on one user story at a time, so you can agree on the order in which you will deliver them.
It's best to look at each story / query as separate. This way you can add or remove functionality from the schedule depending on the needs of the client. If you spot common patterns, you can refactor them as you go along.
Many problems come from people trying to over-optimise or over-generalise. I would write each query separately unless you find that they are starting to overlap.
The people paying the bills can ask for anything they want. If what they are asking for really doesn't make sense, then try to make them see reason.
A business requirement shouldn't be changed or removed just because it might be 'hard' to implement.
Design your Database schema to reflect the domain model and normalise to at least 3NF.
Generally, aggregate queries (such as those commonly used to drive reports) can be implemented to utilise indexes and RDBMS specific features to reduce their running time.
It sounds like you just need to improve your data design skills. A properly designed/ normalized database, as astander suggests, won't run into the issues you're worried about. But it takes a lot of time to learn the Right Way to design a database if you just keep learning from mistakes. When I was starting out as a web dev, I found Database Design for Mere Mortals a huge help in showing you how to avoid painting yourself into corners. There's a companion book about how to write good queries on your databases as well. The two books won't teach you everything there is to know, but they give you a great foundation.
It sounds like you are lacking a bit in your confidence. These are the kinds of problems developers face every day. I think it's good that you recognize a possible weakness and are taking steps to improve. Take some time to learn more about databases and queries.
That said let me answer some of your questions directly:
Now, if there is any request that can
potentially take longer time for the
exection, can the developer satisfy
this request by breaking down the
request?
Yes you can break down the request.
Not every request can be satisfied in
a single query.
Do I need to tell him why is this
break down necessary?
Only if he asks. As long as you meet the requirement you should be fine. If he knew how to do it a better way then why did he hire you?
What if he feels that I am not capable
of doing it in a single shot?
Once again, if he knew a better way
to code the database and reports then
why did he hire you?
Is every report that he asks for need
to be in single query? or will there
be any need to itake the help of PHP
to proces one loop and based on the
values that I get, I put some
conditions to apply rules that the
client wants?
No, not everything can be done in a
single query, it depends on the
complexity of the report.
The design of you relational database will have a huge impact on what can be done (in an effective way) or not. I tend to say it is the most crucial part of your application.
Your process for drawing you tables design is ok. but after that you should review your different Uses Cases and see (with just a pencil and paper) if your database design is able to cope with each of them.
You could then discuss with your client to make sure some cases will never happen. ("e.g: Can you confirm that a Product will always belong to 1 and only 1 Category").
That said, the client can, indeed, ask anything in the specs. You are free to accept, refuse, or explain to him why his specs are unrealistic. If you develop a for a fixed price without clear specs, you're in a bad situation, and it's your fault...

Designing a main form ("main menu") for a WinForm application

The form that currently loads during when our beta WinForm application starts up is one that shows a vast array of buttons... "Inventory", "Customers", "Reports", etc. Nothing too exciting.
I usually begin UI by looking at similar software products to see how they get done, but as this is a corporate application, I really can't go downloading other corporate applications.
I'd love to give this form a bit of polish but I'm not really sure where to start. Any suggestions?
EDIT: I am trying to come up with multiple options to present to users, however, I'm drawing blanks as well. I can find a ton of design ideas for the web, but there really doesn't seem to be much for Windows form design.
I have found that given no option, users will have a hard time to say what they want. Once given an option, it's usually easier for them to find things to change. I would suggest making some paper sketches of potential user interfaces for you application. Then sit down with a few users and discuss around them. I would imagine that you would get more concrete ideas from the users that way.
Update
Just a couple of thoughts that may (or may not) help you get forward:
Don't get too hung up on the application being "corporate". Many coprorate applications that I have seen look so boring that I feel sorry for the users that need to see them for a good share of their day.
Look at your own favourite UI's and ask yourself why you like them.
While not getting stuck in the "corporate template", also do not get too creative; the users collected experience comes from other applications and it may be good if they can guess how things work without training.
Don't forget to take in inspiration from web sites that you find appealing and easy to use.
Try to find a logical "flow"; visualize things having the same conceptual functionality in a consistent way; this also helps the user do successful "guesswork".
You might look to other applications that your users are familiar with. Outlook is ubiquitous in my company, and we were able to map our application to its interface relatively easily, so we used that application as a model when developing our UI.
Note that I'm not suggesting Outlook specifically to you, just that you look for UIs that would make your users' learning curve shallower.
The problem here is that you need some good user analysis and I'm guessing you've only done functional analysis.
Because your problem is so abstract, it's hard to give one good example of what you need to do. I'd go to usability.gov and check out the usability methods link, especially card sorting and contextual interviews.
Basically you want to do two things:
1- Discover where your users think how information is grouped on the page: This will help flesh out your functional requirements too. Once you've got information all grouped up, you've basically got your navigation metaphor set up. Also, you can continually do card sorting exercises right down to page and function levels - e.g. you do one card sorting session to understand user needs, then you take one group of cards and ask users to break that down into ranks of importance. Doing so will help you understand what needs to be in dominate areas of the screen and what can be hidden.
2- Understand what tools they already use: what they do and don't like about them. You need to get a list of tools/applications that they use externally and internally. Internally is probably the most important because there is a fair chance that most people in your business will share an experience of using it. External tools however might help give you context into how your users think.
Also, don't be afraid to get pencil and paper and sketch up ideas with users. People generally understand that sketches are a quick and useful way to help with early design work and you can get an immense amount of information out of them with just simple sketches. Yes, even do this if you suck at sketching - chances are it won't matter. In fact, crappy sketches could even work in your favour because then nobody is going to argue if buttons should be blue, red or whatever.
Frankly, a form with a “vast array of buttons” needs more than a little polish. A form dedicated solely to navigation generally means you’re giving your users unnecessary work. Provide a pulldown or sidebar menu on each form for navigating to any form.
The work area of your starting form should provide users with something to actually accomplish their tasks. Among the options are:
A “dashboard” main form, showing summarized information about the users’ work (e.g., list of accounts to review and status of each, number of orders at each stage of processing, To Do schedule). Ideally, users should be able to perform their most common tasks directly in the opening form (e.g., mark each account as “approved” or not). If further information is necessary to complete a task, links navigate to detailed forms filled with the proper query results. At the very least users should be able to assess the status of their work without going any further. Note that different groups of users may need different things on their respective dashboards.
Default form or forms. Users of a corporate application typically have specific assignments, often involving only one to three of all your forms. Users who work with Inventory, for example, may almost never need to look at Customer records, and vice versa. Users also often work on a specific subset of records. Each sales rep, for example may be assigned a small portion of the total number of customers in the database. Divide your users into groups based on the forms and records they usually use. For each user group, start the app by automatically opening the user group’s form(s) populated with the query results of their records. Users should be able to complete most of their work without any further navigation or querying.
If all else fails, open the app to whatever forms and content were last open when the user quit the app. Many corporate users will continue to work tomorrow on the same or similar stuff they’re working on today.
Analyze the tasks of your users to determine which of the above options to use. It is generally not productive to describe each option to the users and ask which they like better.
BTW, “Reports” is probably not a particularly good navigation option. It’s better if you consistently identify things primarily by what they show, rather than how they show it. Users may not know that the information they want to see is in a “report” rather than a form, but they’ll know what content they want to see. Reports on inventory are accessed under Inventory; reports about sales are accessed under Sales.
Have you tried asking your end users what they would like? After all they are the ones that are going to be using the system.
I use components from the company DevExpress. They have some really cool controls (such as the Office 2007 ribbon), form skinning utilities (with a vast amount of different skins), and a load more...
If you want to check it out they have 60 free components - if its corporate though you might have to check the licence but you can get it at... DevExpress 60 Free
I suggest starting with the design principles suggested by Microsoft: Windows User Experience Interaction Guidelines
Some places to get ideas for interaction designs:
Books
About Face 3 - The Essentials of Interaction Design
Don't Make Me Think (this is focused on web design, but many of the principles carry over to Windows design)
Web Sites
Windows User Experience Interaction Guidelines
In addition, many applications have free trial versions that you can download to determine how they handle user interaction. Also, don't discount items on your desktop right now.
Do you have any statistics or insights concerning what the most commonly-used or important functions might be? If so, you could use that to pare down your "vast array of buttons" and highlight only those that are most important.
That's sort of a trivial example, but the underlying point is that your understanding of your audience should inform your design, at least from a functional perspective. You might have past usage statistics, or user stories, or documented workflows, or whatever - even if you're drawing a blank right now, remember that you have to know something about your users, otherwise you wouldn't be able to write software for them.
Building on what they already know can make it easy on your users. Do they live in Outlook? Then you might want to mimic that (as Michael Petrotta suggested). Do they typically do the same thing (within a given role) every time they use the app? Then look for a simple, streamlined interface. Are they power users? Then they'll likely want to be able to tweak and customize the interface. Maybe you even have different menu forms for different user roles.
At this stage, I wouldn't worry about getting it right; just relax and put something out there. It almost doesn't matter what you design, because if you have engaged users and you give them the option, they're going to want to change something (everything?) anyway. ;-)

Resources