how do I label inputs and outputs in a gpt-2 training dataset?

how do I label inputs and outputs in a gpt-2 training dataset? - dataset

how do i label inputs and outputs in a gpt-2 training dataset txt
this is how my dataset looks so far:
{input text}
{output text}
<|endoftext|>
{input}
...
how do I tell gpt-2 that {input text} is an input and that {output text} is the output for that input?
I tried searching this on google and only found info about the <|endoftext|> tag.

Related

Speech to text based on time

I'm trying to write a code that will convert my input text to speech when a certain time reached. So i wrote it under
Form1_load function
Like this
It's From c# window forms
ItemArray[1] holds values from a datetime picker and
ItemArray[2] holds values from a text box
If(datagridview1.rows.columns >O)
{
If(table.Rows[0].ItemArray[1].Equals(Datetime.Now))
{
msg.Speak((table.Rows[0].ItemArray[2]);
}
}

How do I create pairs of images and captions in OpenAI CLIP format?

I have the following problem. I want to make training data for OpenAIs dall-e in the CLIP format, i.e. make pairs of image and text (captions) with the same filename. I've tried to do that in the code below, but it doesn't work as I want it. The function "caption_image" creates 5 captions for each image in the folder, and prints that, but when I try to create a txt file with the 5 correct captions for each image, somehow all the text files end up with the just 1 caption, and it's the same caption for all the images in the folder.
Can anyone see what the mistake is in the code below?
Thanks a lot. I'm new to coding, and I hope you can help.
import os
from google.colab import drive
drive.mount('/content/drive')
images = []
path = '/content/drive/My Drive/Godard_imgs/Eloge_jpgs/eloge_sample/'
for filename in [filename for filename in os.listdir(path) if filename.endswith(".png") or filename.endswith(".jpg")]:
image = os.path.join(path, filename)
images.append(image)
for img in images:
captionss = caption_image(img, args, net, preprocess)
for caption in captionss:
for filename in [filename for filename in os.listdir(path) if filename.endswith(".png") or filename.endswith(".jpg")]:
with open("{}.txt".format(filename), "w") as f:
f.write(caption + '\n')

How to Read content of different file(.txt, .pdf, .docx) Using ReactFileReader component in react js

I want to read the content of uploaded file(with different file extension, may be .txt, .docx, .pdf) in react js. My code is below. Now i am using ReactFileReader component. In my coding, it can read just content of txt file. It can't read content of pdf, docx. How to solve it. Please help me. Thank You.
import React, { Component } from "react";
import ReactFileReader from 'react-file-reader';
class DisplayController extends Component {
constructor(props){
super(props)
this.state = {
value: '',
file : ""
}
}
handleFiles = files => {
let reader = new FileReader();
reader.onload = function () {
alert("Read Data : " + reader.result)
}
reader.readAsText(files[0])
}
render() {
return (
<form>
<div className="files">
<ReactFileReader fileTypes={['.pdf','.txt','.docx']} handleFiles={this.handleFiles}>
<button className='btn'>Upload</button>
</ReactFileReader>
</div>
</form>
)
}
}
export default DisplayController;

It's very complicated, but not impossible, to read the contents of a docx file: That file is a .zip file that contains a lot of other files, which in return contain XML markup describing the file contents.
But this is something that usually is not done in a browser as none of the tools required for that are shipped by default with browsers. You would probably need dozens of additional libraries to deal with that.
Something like that should probably be done on a server.
It is however almost completely impossible to read the contents of a pdf. A pdf can take many forms and in the worst case it has not strings characters embedded, but glyphs or little images of characters, along with coordinates for every single character.
Unless you know the exact tool that pdf was created and knew exactly how the file for that would look internally, it is not feasible to parse that into text.
You could investigate to use a component to display the pdf to your user instead, if that matches your use case somethow. That should be possible.

collecting tweets using tweepy cursor

I want to collect tweets (in python dictionary format) containing some key words into a csv file. I have used tweepy cursor. But it returns nothing.
Following the answer in "Managing Tweepy API Search" I have tried to collect tweets using cursor. But the code stops after some seconds, with out returning any message.
import TwitterCredentials
import tweepy
import csv
from tweepy import OAuthHandler
import json
def authenticate():
auth=OAuthHandler(TwitterCredentials.consumerKey,
TwitterCredentials.consumerSecretKey)
auth.set_access_token(TwitterCredentials.accessToken,
TwitterCredentials.accessTokenSecret)
api=tweepy.API(auth)
return api
def collectTweet(api, query, max_tweets):
i=1
for tweet in tweepy.Cursor(api.search, q=query).items(max_tweets):
loadCsvFile(json.loads(tweet))
print(str(i)+ " ")
i+=1
def loadCsvFile(tweet):
csv_file.writerow([tweet['id'],tweet['created_at'],tweet['text'],
tweet['retweet_count'],tweet['source']])
if __name__ == '__main__':
query=['air pollution', 'PM 2.5']
max_tweets=500
f=open('collected_tweets.csv', 'w')
csv_file=csv.writer(f)
csv_file.writerow(['id','created_at','text',
'retweet_count','source'])
api=authenticate()
collectTweet(api, query, max_tweets)
I want to get the messages in dictionary format such that the id, created_at, text, source information can be extracted from it.
This code didn't return any error and didn't return any message also.

the tweepy.cursor returns status where _json is the dict where all the field of a tweet are present. so the code should be
for status in tweepy.Cursor(api.search, q=query, lang='en').items(max_tweets):
loadCsvFile(status._json)
...
Then it worked.

Database/Excel import and export format

We have a database program that can export and import excel sheets. The excel sheets it exports are formatted as text, I've noticed you cannot change the formatting in these exports without performing text to columns on the data first.
When I edit an export for import or create a new import file I format the cells as text but it doesn't upload nicely. It rounds numbers and skips some data.
I would like to know how to format the import file the same as the export file but I am not familiar with what kind of format behaves the way I described. I've written a bit of VBA to create the import file from data in a workbook so applying the formatting in VBA would be ideal. Any insight would be very appreciated.

What is the 'database program'? If this is an old legacy tool, you may not have many, or any options. If it is Access, you can write some VBA to format the exported Excel report.
'This deals with Excel already being open or not
On Error Resume Next
Set xl = GetObject(, "Excel.Application")
On Error GoTo 0
If xl Is Nothing Then
Set xl = CreateObject("Excel.Application")
End If
Set XlBook = GetObject(filename)
'filename is the string with the link to the file ("C:/....blahblah.xls")
'Make sure excel is visible on the screen
xl.Visible = True
XlBook.Windows(1).Visible = True
'xl.ActiveWindow.Zoom = 75
'Define the sheet in the Workbook as XlSheet
Set xlsheet1 = XlBook.Worksheets(1)
'Then have some fun!
with xlsheet1
.range("A1") = "some data here"
.columns("A:A").HorizontalAlignment = xlRight
.rows("1:1").font.bold = True
end with
'And so on...
You can't format data imported into an Access table, but you can add some formatting to an Access report or Form (I doubt you want to do this).