Examples¶
The examples are split in two parts: the selection of files and the standard interface for reading/writing files.
Selecting Files¶
The file selection provides multiple ways for retrieving the desired files. All selecting functions contain three possibility to match:
- file_ending: matching 100% of the part after the last
.
- pattern: standard pattern for finding fixed strings with wildcards (like
SomeFixedName_*.csv
with * representing all kinds of string) - regex: a standard regular_expression (regex) matching the filename
The easiest way to get the latest file matching containing the name DataSource1
in the beginning and which is a .csv
file:
from fil_io import select
# Explicit way
file_name = select.get_newest_file_from_directory(
directory="path/to/directory",
pattern="DataSource1*",
file_ending="csv"
)
# Shortened way
file_name = select.get_newest_file_from_directory(
directory="path/to/directory",
pattern="DataSource1*.csv"
)
File reading / writing¶
The library provides a standardized way of interacting with files. For every file-type in the file_IO subpackage, there exist load- & write-functions following the same pattern. Only exception is the xls module due to the characteristics of sheets.
Examples are given mostly with csv module, switch the csv
to whatever submodule/-package you need.
All-in-one/doing-all-the-magic loading functions¶
The most easy way to load data is with the load-type function. It is a shortcut for the specific ways of loading data in each file-type specific module:
from fil_io import csv
data = csv.load(path="path/to/file.csv")
# data is list of lists representing the csv file
data = csv.load(path="directory/of/multiple/files")
# data is dictionary representing all csv files with {file_name: file_content}
The most easy way to write data is with the write-type function. It is again a shortcut to file-type specific modules:
# data is written to the csv file
from fil_io import csv
csv.write(file_name="path/to/file.csv", data=data_to_write)
# data is written to the json file
from fil_io import json
json.write(file_name="path/to/file.json", data=data_to_write)
File-type specific modules: advanced reading/writing¶
For every file-type exist more specific functions for reading & writing the data. The presented examples from above are redirecting to the most general functions in the packages.
If using a IDE, the implemented functions will be shown to you after importing the file-specific module directly with typing csv.
and hitting tab.
If in interactive mode, simply run csv.__all__
.
Reading¶
The reading of the files is fairly simple
from fil_io import csv
# load single csv file
data = csv.load_single(file_name="path/to/file.csv")
# data is representing the csv file
# load specific list of csv files
data = csv.load_these(file_name_list=["path/to/file1.csv", "path/to/file2.csv"])
# data is representing both csv files; {file_name: file_content}
# load all csv files from a directory
data = csv.load_all(directory="/path/to/directory")
# data is representing all csv files of this directory; {file_name: file_content}
# doing all of the above depending if `path` is file, list_ofs or directory
data = csv.load(path="path/to/any")
# depending if single file or multiple files either dictionary representing json file or {file_name: json_value}
Writing¶
For writing, the fil_io package provides sometimes some more options for making life easier. The concept this package is designed, is to work most likely with data in form of a dictionary. Therefore, often shortcuts are provided.
Let’s have a look to row-based file-type csv (comma separated values): You can provide either row-based data (in python this would be a list of lists), or you can provide a dictionary instead and let fil_io take care of the conversion. This little magic is part of the fil_io.convert module, more details below.
from fil_io import csv
# lets start with row-based data
example_rows = [
["Header1", "Header2", "Header3"],
["Value11", "Value12", "Value13"],
["Value21", "Value22", "Value23"]
]
csv.write_from_rows(file_name="path/to/csv.csv", rows=example_rows)
# The result in the file:
# Header1,Header2,Header3
# Value11,Value12,Value13
# Value21,Value22,Value23
# in difference with data in form of a dictionary
example_dict = {
"Header1": {
"Value11": {
"Header2": "Value12",
"Header3": "Value13"
},
"Value21": {
"Header2": "Value22",
"Header3": "Value23"
}
}
}
csv.write_from_dict(file_name="path/to/csv.csv", data=example_dict)
# The result in the file is the same:
# Header1,Header2,Header3
# Value11,Value12,Value13
# Value21,Value22,Value23
# additionally the data can be provided without the naming of the main_key
# (in this case "Header1")
example_dict2 = {
"Value11": {
"Header2": "Value12",
"Header3": "Value13"
},
"Value21": {
"Header2": "Value22",
"Header3": "Value23"
}
}
csv.write_from_dict(
file_name="path/to/csv.csv",
data=example_dict,
main_key_name="Header1",
main_key_position=0
)
# The result in the file is still the same:
# Header1,Header2,Header3
# Value11,Value12,Value13
# Value21,Value22,Value23
Again, there is a function combining both writing methods, available also with a shortcut stated
in the very beginning of the examples: csv.write
xls/xlsx Files¶
The Microsoft Excel file interaction works slightly different since sheets are a feature not available to standard file formats like json, csv or xml. The standard output format is Pandas DataFrame.
Yet, interaction is still fairly simple:
from fil_io import xls
data_frame = xls.load_single_sheet(file_name="path/to/file.xls") # .xlsx works with the same function
# returns a pandas.data_frame from first sheet
# you can specify a sheet_name
data_frame = xls.load_single_sheet(file_name="path/to/file.xls", sheet="Sheet_Name")
# returns a pandas.data_frame from sheet with provided name
# of course multiple sheets can be loaded
data = xls.load_these_sheets(file_name="path/to/file.xls", sheets=["Sheet_Name1", "Sheet_Name2"])
# just like the other loading functions, the sheet_name is the key in a dictionary containing the data_frame as value
# {"Sheet_Name": DataFrame}
# loading all sheets
data = xls.load_all_sheets(file_name="path/to/file.xls")
# {"Sheet_Name": DataFrame}
# reading multiple files is possible as well
data = xls.load_theses(file_name_list=["path/to/file1.xls", "path/to/file2.xls"])
# {file_name: {sheet_name: DataFrame}}