I am working with a word file using python-docx, I want to copy an excel table with style in this open document (word file)
I found this code in internet, but here doc file is defined word.Documents.Open('C:\Users...\test.docx')
from win32com import client
excel = client.Dispatch("Excel.Application")
word = client.Dispatch("Word.Application")
doc = word.Documents.Open('C:\\Users\...\test.docx')
book = excel.Workbooks.Open('C:\\Users\...\test.xlsx')
sheet = book.Worksheets(1)
sheet.Range("A1:O11").Copy()
doc.Content.PasteExcelTable(False,False,False)
while in my current code from python-docx
from docx import Document
document = Document()
I would be more than grateful if you respond me. Also all other suggestions regarding table styles are appreciated.
Related
I followed a few tutorials to set up a Google Docs template for a job offer letter. I made mail merge fields in the Google Doc set up as "{{FirstName}}", "{{StartDate}}", etc. I have a Google Sheet and a Google Form. I made a sheet called "Fields" where in column D, I have the "merge fields" and then in Column E, I have the values. I have been pasting lines of code to replace the merge fields with values, one-by-one. I saw a solution in another forum about how to do a loop. But after hours of practice, I couldn't get it working.
Ideally, I would like maintain the merge fields in a Google Sheet, so that I can add more. It would be lovely if I could figure out how to write an Apps Script to read any merge fields on my Google Sheet and then do the replaceText function automatically.
Right now, I have the code working one-by-one, but every time I add merge fields, I have to edit the code to find it in my Google Sheet.
I saw some code here:
https://stackoverflow.com/questions/69545550/how-do-you-loop-through-an-object-and-replace-text#:~:text=Loop%20through%20values%20object%20by%20replacing%20the%20body.replaceText%20entries%20in%20your%20code%20with%20this%3A
that I was trying to replicate for a loop but I couldn't get it working :(
enter image description here
Google Sheet
Google Docs Template
const sheet = SpreadsheetApp
.getActiveSpreadsheet()
.getSheetByName('Form')
//Copy template documents in our destinationFolder
const copy = googleDocTemplate.makeCopy(sheet.getRange('Fields!E3').getValues()+' '+sheet.getRange('Fields!E4').getValues()+` - Offer Letter` , destinationFolder)
//Once we have the copy, we then open it using the DocumentApp
const doc = DocumentApp.openById(copy.getId())
//Create constants for the Google Doc Body and Header so we can use replaceText
const body = doc.getBody();
const docHeader = doc.getHeader().getParent();
//Replace text in Headers of Google Doc
docHeader.replaceText(sheet.getRange('Fields!D3').getValues(), sheet.getRange('Fields!E3').getValues()); //First Name
docHeader.replaceText(sheet.getRange('Fields!D4').getValues(), sheet.getRange('Fields!E4').getValues()); //Last Name
//Replace body text
body.replaceText(sheet.getRange('Fields!D2').getValues(), sheet.getRange('Fields!E2').getValues());
body.replaceText(sheet.getRange('Fields!D3').getValues(), sheet.getRange('Fields!E3').getValues());
body.replaceText(sheet.getRange('Fields!D4').getValues(), sheet.getRange('Fields!E4').getValues());
body.replaceText(sheet.getRange('Fields!D5').getValues(), sheet.getRange('Fields!E5').getValues());
body.replaceText(sheet.getRange('Fields!D6').getValues(), sheet.getRange('Fields!E6').getValues());
body.replaceText(sheet.getRange('Fields!D7').getValues(), sheet.getRange('Fields!E7').getValues());
body.replaceText(sheet.getRange('Fields!D8').getValues(), sheet.getRange('Fields!E8').getValues());
I created a sheet and template as shown.
Now using the following script replace all the placeholders with the text. Notice I deliberatly leave one out.
function replaceText() {
try {
let spread = SpreadsheetApp.getActiveSpreadsheet();
let sheet = spread.getSheetByName("Sheet1");
let values = sheet.getDataRange().getDisplayValues();
values.shift(); // remove header
let file = DriveApp.getFileById("1up..........").makeCopy(values[1][4]+" "+values[2][4]+" - Letter");
let doc = DocumentApp.openById(file.getId());
let body = doc.getBody();
values.forEach( row => {
body.replaceText(row[3],row[4])
}
);
}
catch(err) {
console.log(err);
}
}
With the following results
Reference:
Array.forEach()
Body.replaceText()
I have to create an email and attach an XLSX file. I looked at the BCS_EXAMPLE_7 program.
I have transformed the content with the following method:
TRY.
cl_bcs_convert=>string_to_solix(
EXPORTING
iv_string = lv_content
iv_codepage = '4103'
iv_add_bom = 'X'
IMPORTING
et_solix = pt_binary_content
ev_size = pv_size ).
CATCH cx_bcs.
ls_return-type = text-023.
ls_return-message = text-024.
APPEND ls_return TO pt_return.
ENDTRY.
CONCATENATE lv_save_file_name '_' sy-datum '.xlsx' INTO lv_save_file_name.
lv_attachment_subject = lv_save_file_name.
CONCATENATE '&SO_FILENAME=' lv_attachment_subject INTO ls_attachment_header.
APPEND ls_attachment_header TO lt_attachment_header.
lo_document->add_attachment( i_attachment_type = 'XLS'
i_attachment_subject = lv_attachment_subject
i_attachment_size = pv_size
i_att_content_hex = pt_binary_content
i_attachment_header = lt_attachment_header ).
The email is sent correctly but when I open the attachment I see the error
Cannot open the file because the file extension is incorrect
Could you help me? thanks
That's a normal behavior of Excel, unrelated to ABAP, when the file name has extension .xlsx but doesn't contain data in format corresponding to XLSX. Excel does the same kind of checks for other extensions. If you need more information about these checks, please search the Web.
As I see that your program creates the attachment based on text converted into UTF-16LE code page (SAP code page 4103), I guess that you created the Excel data in format CSV, tab-separated values or even the old Excel XMLSS/XML 2003 format.
In that case, the extension .xlsx is not valid, to avoid the message, use the adequate extension, respectively .csv, .txt or .xml.
If you really need the extension .xlsx for some reason, then you must create the data in XLSX format. You may use the free API abap2xlsx. If you need further assistance about how to use abap2xlsx, please ask a new question (unrelated to email).
NB: maybe you were told to use the extension .xlsx although there is no real need to use it (each format has its own features, but simple unformatted values can be achieved with all formats), in that case you may propose to use a simple format like CSV or tab-separated values.
NB: you may also have the opposite case that Excel sniffs that the file contains data in format corresponding to XLSX, but the file name doesn't have the extension .xlsx, and the same for all other formats, but I can't say what is the exact Excel reaction to each case.
It appears that whatever you have in lv_content isn't actually a valid excel file. You can not just take arbitrary data, give it the extension .xlsx and expect MS Excel to know what to do with it.
Unfortunately, creating valid MS Office files is anything but trivial. It's a format which is theoretically open and based on XML (actually a zip archive containing multiple XML files), but in practice the specification is over a 5000(!) pages long.
Fortunately, there is a library for that. abap2xlsx is an open source (Apache License) library which provides an easy API to create (and read) valid XLSX files in ABAP.
You could also try to open the file with a text editor (eg. NotePad++), maybe this gives a hint of the actual content.
But I guess that something went wrong generating the binary table. Maybe you are using the wrong file size or code page.
Possible problems:
First problem: as correctly said by Sandra you may have invalid content of your lv_content variable, which doesn't correspond to correct XLSX structure.
Second problem: which you already solved, as seen from your coding, BCS classes do not support 4-character extensions.
Here is the sample how to build and send correct XLSX file via mail:
SELECT * UP TO 100 ROWS
FROM spfli
INTO TABLE #DATA(lt_spfli).
cl_salv_table=>factory( IMPORTING r_salv_table = DATA(lr_table)
CHANGING t_table = lt_spfli ).
DATA: lr_xldimension TYPE REF TO if_ixml_node,
lr_xlworksheet TYPE REF TO if_ixml_element.
DATA(lv_xlsx) = lr_table->to_xml( if_salv_bs_xml=>c_type_xlsx ).
DATA(lr_zip) = NEW cl_abap_zip( ).
lr_zip->load( lv_xlsx ).
lr_zip->get( EXPORTING name = 'xl/worksheets/sheet1.xml' IMPORTING content = DATA(lv_file) ).
DATA(lr_file) = NEW cl_xml_document( ).
lr_file->parse_xstring( lv_file ).
* Row elements are under SheetData
DATA(lr_xlnode) = lr_file->find_node( 'sheetData' ).
DATA(lr_xlrows) = lr_xlnode->get_children( ).
* Create new element in the XML file
lr_xlworksheet ?= lr_file->find_node( 'worksheet' ).
DATA(lr_xlsheetpr) = cl_ixml=>create( )->create_document( )->create_element( name = 'sheetPr' ).
DATA(lr_xloutlinepr) = cl_ixml=>create( )->create_document( )->create_element( name = 'outlinePr' ).
lr_xlsheetpr->if_ixml_node~append_child( lr_xloutlinepr ).
lr_xloutlinepr->set_attribute( name = 'summaryBelow' value = 'false' ).
lr_xldimension ?= lr_file->find_node( 'dimension' ).
lr_xlworksheet->if_ixml_node~insert_child( new_child = lr_xlsheetpr ref_child = lr_xldimension ).
* Create xstring and move it to XLSX
lr_file->render_2_xstring( IMPORTING stream = lv_file ).
lr_zip->delete( EXPORTING name = 'xl/worksheets/sheet1.xml' ).
lr_zip->add( EXPORTING name = 'xl/worksheets/sheet1.xml' content = lv_file ).
lv_xlsx = lr_zip->save( ).
DATA lv_size TYPE i.
DATA lt_bintab TYPE solix_tab.
* Convert to binary
CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
EXPORTING
buffer = lv_xlsx
IMPORTING
output_length = lv_size
TABLES
binary_tab = lt_bintab.
DATA main_text TYPE bcsy_text.
* create persistent send request
DATA(send_request) = cl_bcs=>create_persistent( ).
* create document object from internal table with text
APPEND 'Valid Excel file' TO main_text.
DATA(document) = cl_document_bcs=>create_document( i_type = 'RAW' i_text = main_text i_subject = 'Test Created for stella' ).
DATA lt_att_head TYPE soli_tab.
APPEND '<(>&< )>SO_FILENAME=MySheet.xlsx' TO lt_att_head.
* add the spread sheet as attachment to document object
document->add_attachment(
i_attachment_type = 'xls'
i_attachment_subject = 'MySheet'
i_attachment_size = CONV so_obj_len( lv_size )
i_attachment_header = lt_att_head
i_att_content_hex = lt_bintab ).
send_request->set_document( document ).
DATA(recipient) = cl_cam_address_bcs=>create_internet_address( 'some_recipient#mail.com' ).
send_request->add_recipient( recipient ).
DATA(sent_to_all) = send_request->send( i_with_error_screen = 'X' ).
COMMIT WORK.
In my application, there are two buttons(export & import ). Using the export button I can download an excel file, in that downloaded excel file, there are lots of blanks fields.
I need to upload the same file using import button but if I upload the same file then I can get many mandatory fields validation message because of blank fields. Instead of uploading the same file I have another excel file, in which every blank field are filled.
In the exported file, there is a unique id which is generated in the run time and I have to set the same unique id into another excel file, in which every blank field are filled same as the exported file but unique id is different. otherwise, I will get a validation message.
I want to replace the epricer quote number 8766876 to 4181981 in new import file.
Possibly duplicate of following question :
update excel cell in selenium webdriver
Possible solution from link :
InputStream inp = new FileInputStream("workbook.xls");
Workbook wb = WorkbookFactory.create(inp);
Sheet sheet = wb.getSheetAt(0);
Row row = sheet.getRow(2);
Cell cell = row.getCell(3);
if (cell == null)
cell = row.createCell(3);
cell.setCellType(Cell.CELL_TYPE_STRING);
cell.setCellValue("a test");
// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();
You cannot update Excel using Selenium, Selenium is a series of libraries that are used to drive a browser, they cannot interact with Excel documents.
If you want to interact with Excel documents you will need to use a library specifically designed to do that, some options are:
Apache POI
DocX4J
JExcelAPI
I am writing an application that works with Excel files. So far I have been using Gembox spreadsheet to work with excel files. However, I discovered using Gembox spreadsheet I can save pics to excel files, but not retrieve them. Anyone can recommend how to retrieve a pic from excel file? Thank you
Here is how you can retrieve an image from an Excel file with GemBox.Spreadsheet:
ExcelFile workbook = ExcelFile.Load("Sample.xlsx");
ExcelWorksheet worksheet = workbook.Worksheets.ActiveWorksheet;
// Select Picture element.
ExcelPicture picture = worksheet.Pictures[0];
// Import to PictureBox control.
this.pictureBox1.Image = Image.FromStream(picture.PictureStream);
// Or write to file.
File.WriteAllBytes("Sample.png", picture.PictureStream.ToArray());
The Problem
I use a tool at work that lets me do queries and get back HTML tables of info. I do not have any kind of back-end access to it.
A lot of this info would be much more useful if I could put it into a spreadsheet for sorting, averaging, etc. How can I screen-scrape this data to a CSV file?
My First Idea
Since I know jQuery, I thought I might use it to strip out the table formatting onscreen, insert commas and line breaks, and just copy the whole mess into notepad and save as a CSV. Any better ideas?
The Solution
Yes, folks, it really was as easy as copying and pasting. Don't I feel silly.
Specifically, when I pasted into the spreadsheet, I had to select "Paste Special" and choose the format "text." Otherwise it tried to paste everything into a single cell, even if I highlighted the whole spreadsheet.
Select the HTML table in your tools's UI and copy it into the clipboard (if that's possible
Paste it into Excel.
Save as CSV file
However, this is a manual solution not an automated one.
using python:
for example imagine you want to scrape forex quotes in csv form from some site like:fxquotes
then...
from BeautifulSoup import BeautifulSoup
import urllib,string,csv,sys,os
from string import replace
date_s = '&date1=01/01/08'
date_f = '&date=11/10/08'
fx_url = 'http://www.oanda.com/convert/fxhistory?date_fmt=us'
fx_url_end = '&lang=en&margin_fixed=0&format=CSV&redirected=1'
cur1,cur2 = 'USD','AUD'
fx_url = fx_url + date_f + date_s + '&exch=' + cur1 +'&exch2=' + cur1
fx_url = fx_url +'&expr=' + cur2 + '&expr2=' + cur2 + fx_url_end
data = urllib.urlopen(fx_url).read()
soup = BeautifulSoup(data)
data = str(soup.findAll('pre', limit=1))
data = replace(data,'[<pre>','')
data = replace(data,'</pre>]','')
file_location = '/Users/location_edit_this'
file_name = file_location + 'usd_aus.csv'
file = open(file_name,"w")
file.write(data)
file.close()
edit: to get values from a table:
example from: palewire
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
mech = Browser()
url = "http://www.palewire.com/scrape/albums/2007.html"
page = mech.open(url)
html = page.read()
soup = BeautifulSoup(html)
table = soup.find("table", border=1)
for row in table.findAll('tr')[1:]:
col = row.findAll('td')
rank = col[0].string
artist = col[1].string
album = col[2].string
cover_link = col[3].img['src']
record = (rank, artist, album, cover_link)
print "|".join(record)
This is my python version using the (currently) latest version of BeautifulSoup which can be obtained using, e.g.,
$ sudo easy_install beautifulsoup4
The script reads HTML from the standard input, and outputs the text found in all tables in proper CSV format.
#!/usr/bin/python
from bs4 import BeautifulSoup
import sys
import re
import csv
def cell_text(cell):
return " ".join(cell.stripped_strings)
soup = BeautifulSoup(sys.stdin.read())
output = csv.writer(sys.stdout)
for table in soup.find_all('table'):
for row in table.find_all('tr'):
col = map(cell_text, row.find_all(re.compile('t[dh]')))
output.writerow(col)
output.writerow([])
Even easier (because it saves it for you for next time) ...
In Excel
Data/Import External Data/New Web Query
will take you to a url prompt. Enter your url, and it will delimit available tables on the page to import. Voila.
Two ways come to mind (especially for those of us that don't have Excel):
Google Spreadsheets has an excellent importHTML function:
=importHTML("http://example.com/page/with/table", "table", index
Index starts at 1
I recommend a copy and paste values shortly after import
File -> Download as -> CSV
Python's superb Pandas library has handy read_html and to_csv functions
Here's a basic Python3 script that prompts for the URL, which table at that URL, and a filename for the CSV.
Quick and dirty:
Copy out of browser into Excel, save as CSV.
Better solution (for long term use):
Write a bit of code in the language of your choice that will pull the html contents down, and scrape out the bits that you want. You could probably throw in all of the data operations (sorting, averaging, etc) on top of the data retrieval. That way, you just have to run your code and you get the actual report that you want.
It all depends on how often you will be performing this particular task.
Excel can open a http page.
Eg:
Click File, Open
Under filename, paste the URL ie: How can I scrape an HTML table to CSV?
Click ok
Excel does its best to convert the html to a table.
Its not the most elegant solution, but does work!
Basic Python implementation using BeautifulSoup, also considering both rowspan and colspan:
from BeautifulSoup import BeautifulSoup
def table2csv(html_txt):
csvs = []
soup = BeautifulSoup(html_txt)
tables = soup.findAll('table')
for table in tables:
csv = ''
rows = table.findAll('tr')
row_spans = []
do_ident = False
for tr in rows:
cols = tr.findAll(['th','td'])
for cell in cols:
colspan = int(cell.get('colspan',1))
rowspan = int(cell.get('rowspan',1))
if do_ident:
do_ident = False
csv += ','*(len(row_spans))
if rowspan > 1: row_spans.append(rowspan)
csv += '"{text}"'.format(text=cell.text) + ','*(colspan)
if row_spans:
for i in xrange(len(row_spans)-1,-1,-1):
row_spans[i] -= 1
if row_spans[i] < 1: row_spans.pop()
do_ident = True if row_spans else False
csv += '\n'
csvs.append(csv)
#print csv
return '\n\n'.join(csvs)
Here is a tested example that combines grequest and soup to download large quantities of pages from a structured website:
#!/usr/bin/python
from bs4 import BeautifulSoup
import sys
import re
import csv
import grequests
import time
def cell_text(cell):
return " ".join(cell.stripped_strings)
def parse_table(body_html):
soup = BeautifulSoup(body_html)
for table in soup.find_all('table'):
for row in table.find_all('tr'):
col = map(cell_text, row.find_all(re.compile('t[dh]')))
print(col)
def process_a_page(response, *args, **kwargs):
parse_table(response.content)
def download_a_chunk(k):
chunk_size = 10 #number of html pages
x = "http://www.blahblah....com/inclusiones.php?p="
x2 = "&name=..."
URLS = [x+str(i)+x2 for i in range(k*chunk_size, k*(chunk_size+1)) ]
reqs = [grequests.get(url, hooks={'response': process_a_page}) for url in URLS]
resp = grequests.map(reqs, size=10)
# download slowly so the server does not block you
for k in range(0,500):
print("downloading chunk ",str(k))
download_a_chunk(k)
time.sleep(11)
Have you tried opening it with excel?
If you save a spreadsheet in excel as html you'll see the format excel uses.
From a web app I wrote I spit out this html format so the user can export to excel.
If you're screen scraping and the table you're trying to convert has a given ID, you could always do a regex parse of the html along with some scripting to generate a CSV.