1.0.0 home page
Description:
Welcome to the Data Science Playbook Hello Data Science World! This project is a collection of code snippets for doing data science. This site is dedicated to using machine learning to train models that train humans to learn about machine learning. Why the project is useful Data science is better when you can grab code snippets from a recipe book. It makes every single data science byte extra tasty. How users can get started with the project - Feel free to grab snippets of code and use it. Just remember, This code is available on an "AS IS" basis, without any warranties or conditions of any kind, either expressed or implied Where users can get help with your project - If you have ideas for improving this repository, send me an email at DataSciencePlaybook@gmail.com Who maintains and contributes to the project This code is available on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Example:
Topic | Page |
---|---|
1-Home | 1.0.0 home page |
Import Data | 2.0.0 Import from csv |
Import Data | 2.0.1 Import from tab delimited text file |
Import Data | 2.0.2 Import from file with custom delimiter |
Import Data | 2.0.3 Import from Excel file |
Import Data | 2.0.4 Import from pickle |
Import Data | 2.0.5 Import from Google Sheet |
Import Data | 2.0.6 Import from SQL database |
Import Data | 2.0.7 Import limited set of columns from text file |
Import Data | 2.0.8 Specify data types of imported data from text file |
Import Data | 2.0.9 Specify data types of imported data from Excel |
Import Data | 2.0.10 Import limited set of columns from Excel |
Export Data | 3.0.0 Export to csv |
Export Data | 3.0.1 Export to tab delimited text file |
Export Data | 3.0.2 Export to file with custom delimiter |
Export Data | 3.0.3 Export to Excel |
Export Data | 3.0.4 Export to pickle |
Export Data | 3.0.5 Export a dataframe to a database table |
Export Data | 3.0.6 Specify which column names you want to be exported for flat files |
Data Connections | 6.0.0 Construct a filepath |
Data Connections | 6.0.1 Add a Date Stamp for a filepath |
Data Connections | 6.0.2 Create a SQLite Connection |
Python to Dataframe | 1.0.0 Create Empty Dataframe |
Python to Dataframe | 1.0.1 Create Empty Dataframe with column names |
Python to Dataframe | 1.0.2 Create dataframe and add columns from lists |
Python to Dataframe | 1.0.3 convert a column to a list |
Python to Dataframe | 1.0.4 Export two columns to a python dictionary |
Databases | 5.0.0 Import a Password from a file stored outside your code |
Databases | 5.0.1 SQLiite Connection |
Databases | 5.0.2 Oracle Connection |
Databases | 5.0.3 SQL Server Connection Trusted Connection |
Databases | 5.0.4 Query a Database with SQL and create a DataFrame |
Table Definitions | 7.0.0 Get column names |
Table Definitions | 7.0.1 Get column datatypes |
Table Definitions | 7.0.2 Total rows and columns |
Table Definitions | 7.0.3 Total rows - including nulls |
Table Definitions | 7.0.4 Total columns |
Table Definitions | 7.0.5 get column names counts of values datatypes and memory usage |
Table Definitions | 7.0.6 Count non-null instances in column |
Table Definitions | 7.0.7 get number of rows that are NULL |
Table Definitions | 7.0.8 Get summary of null values for all columns |
Table Definitions | 7.0.9 Count unique or distinct values in a column |
Table Definitions | 7.0.10 Gget percent of times each value occurs |
Table Definitions | 7.0.11 count duplicate values and get sum of each value in column |
Table Definitions | 7.0.12 Count of values where value equals x in column |
Table Definitions | 7.0.13 Count number of duplicate rows |
Table Modifications | 8.0.0 Limit DataFrame to only specfic columns |
Table Modifications | 8.0.1 Set the order of the columns |
Table Modifications | 8.0.2 Drop specific colums |
Table Modifications | 8.0.3 Change column names |
Table Modifications | 8.0.4 Change column data type to float |
Table Modifications | 8.0.5 Change column data type to int |
Table Modifications | 8.0.6 Change column data type to string |
Table Modifications | 8.0.7 Change column data type to datetime |
Table Modifications | 8.0.8 Reference - Date Format Symbols |
Data Formatting | 9.0.0 Extract Year from Date |
Data Formatting | 9.0.1 Extract Month Number from Date |
Data Formatting | 9.0.2 Extract Date of the month from Date |
Data Formatting | 9.0.3 Extract Day of the Week Number from date |
Data Formatting | 9.0.4 Extract Date of the Week Name from date |
Data Formatting | 9.0.5 Extract Time from Date |
Data Formatting | 9.0.6 Extract Hour from Date |
Data Formatting | 9.0.7 Extract Minute from date |
Data Formatting | 9.0.8 Filtering with Dates |
Data Formatting | 9.0.9 difference between two dates |
Data Formatting | 9.0.10 Create a Lagged Feature |
Data Formatting | 9.0.11 deal with null values |
Data Formatting | 9.0.12 Apply a function to process text |
Data Formatting | 9.0.13 count total characters |
Data Formatting | 9.0.14 count total words |
Data Formatting | 9.0.15 count occurances of specific word |
Data Formatting | 9.0.16 capitalize first letter in sentance |
Data Formatting | 9.0.17 capitalize first letter in each word in sentance |
Data Formatting | 9.0.18 convert to upper case |
Data Formatting | 9.0.19 convert to lower case |
Data Formatting | 9.0.20 Remove punctuation from text |
Data Formatting | 9.0.21 strip front and back spaces |
Data Formatting | 9.0.22 stem words in a string |
Data Formatting | 9.0.23 return nth word in a string |
Data Formatting | 9.0.24 Return Nth sentance in a string |
Data Formatting | 9.0.25 Return Substring between two words |
Conditional Logic | 10.0.0 Query a DataFrame with SQL |
Conditional Logic | 10.0.1 Where Column Equals Value |
Conditional Logic | 10.0.2 Where Column Does NOT Equal Value |
Conditional Logic | 10.0.3 Where Column IN List |
Conditional Logic | 10.0.4 Where Column NOT IN List |
Conditional Logic | 10.0.5 Where Column is Null |
Conditional Logic | 10.0.6 Where Column is Not Null |
Conditional Logic | 10.0.7 Where multiple conditions are all true - AND Logic |
Conditional Logic | 10.0.8 Where one or more condition is true - OR Logic |
Conditional Logic | 10.0.9 CASE WHEN logic - Option 1 - pandas .loc |
Conditional Logic | 10.0.10 CASE WHEN logic - Option 2 np.where |
Conditional Logic | 10.0.11 CASE WHEN logic - Option 3 -create and apply a custom function |
Conditional Logic | 10.0.12 Create a rank column |
Combine Group and Sort | 11.0.0 JOIN or MERGE two dataframes |
Combine Group and Sort | 11.0.1 STACK and UNION DataFrames on top of each other |
Combine Group and Sort | 11.0.2 Append a new row to a DataFrame with a Dictionary |
Combine Group and Sort | 11.0.3 Append a new row to a DataFrame with a pd.Series |
Combine Group and Sort | 11.0.4 Use SQL to perform a Group By operation |
Combine Group and Sort | 11.0.5 Sort Ascending and descending |
Combine Group and Sort | 11.0.6 Get TOP x Rows |
Combine Group and Sort | 11.0.7 Get BOTTOM X Rows |
Combine Group and Sort | 11.0.8 Get random sample of X rows |
Combine Group and Sort | 11.0.9 filter to rows in a list of index values, such as a range |
Combine Group and Sort | 11.0.10 filter using index position |
Combine Group and Sort | 11.0.11 Create and iteratively fill an empty DataFrame |
SQL DF | nan.0 Format a SQL string with a parameter |
Descriptive Stats | 12.0.0 Generate summary statistics |
Descriptive Stats | 12.0.1 Sum |
Descriptive Stats | 12.0.2 Count non-null values |
Descriptive Stats | 12.0.3 average or mean |
Descriptive Stats | 12.0.4 median |
Descriptive Stats | 12.0.5 mode |
Descriptive Stats | 12.0.6 Standard Deviation |
Descriptive Stats | 12.0.7 Min |
Descriptive Stats | 12.0.8 Max |
Descriptive Stats | 12.0.9 Quantiles |
Descriptive Stats | 12.0.10 Two Standard Deviations from the mean |
Descriptive Stats | 12.0.11 Z-Score |
Data Visualization | 13.0.0 Bar Chart |
Data Visualization | 13.0.1 Histogram |
Data Visualization | 13.0.2 Cool Charts |
Data Exploration for ML | nan.0 Data Exploration for ML |
Preprocessing | 1.0.0 Use a Scikit-Learn Transformer |
Preprocessing | 1.0.1 Replace Null with most frequent value |
Preprocessing | 1.0.2 Create Dummy Variables with the OneHotEncoder |
Pipelines | 1.0.0 Full Pipeline example |
Pipelines | 1.0.1 Create a pipeline containing sub-pipelines |
Pipelines | 1.0.2 Fit a pipeline on a dataset |
Pipelines | 1.0.3 Use a fitted pipeline to transform a new dataset |
Pipelines | 1.0.4 Fit a pipeline and transform a dataset |
Pipelines | 1.0.5 Get feature names generated by a pipeline |
Pipelines | 1.0.6 Convert processed new data from numpy array to a DataFrame with names |
Models | nan.0 Full ML Example |
useful snippets | 14.0.0 Import custom libraries to run in your python program |
useful snippets | 14.0.1 Create a bat file to run a python file |
useful snippets | 14.0.2 prevent a bat file from closing after the file is done running until you press a key |
useful snippets | 14.0.3 Copy a file to a new location |
useful snippets | 14.0.4 Get the current date and time as a string |
useful snippets | 14.0.5 Log when an action happens |
useful snippets | 14.0.6 Run another python program from within your python program |
useful snippets | 14.0.7 Calculate the amount of memory available |
useful snippets | 14.0.8 Delete a dataframe from memory |
useful snippets | 14.0.9 Calculate how much time it takes to do something |
useful snippets | 14.0.10 Use SQL to see how many rows are in a table |