Sign Up | Advertise
Learn and develop essential AI skills with the Microsoft Learn AI Skills Challenge. Join the technical community in your region and attend live sessions while progressing through the challenges. The challenge begins on July 17th and runs through August 14th. Preview the topics by signing up today.
Sign Up Now!
👋 Hey,
"Predictive analytics is the fuel that powers the engine of artificial intelligence, enabling machines to learn, adapt, and make decisions with human-like intelligence."
- Jeremy Achin, CEO of DataRobot.
By leveraging the power of predictive analytics, machines gain the ability to learn, adapt, and make decisions with a level of human-like intelligence, revolutionizing numerous industries.
Embark on an insightful learning experience with DataPro#51, where we bring you a wealth of practical production-ready solutions for harnessing your Data and Machine Learning skills.
To kick things off, join the free Microsoft Learn AI Skills Challenge and elevate your AI understanding while seamlessly integrating it into your work. Delve into the intricacies of Time Series Indexing as meticulously created by Mihalis Tsoukalos. Gain valuable insights into the groundbreaking GPT-4 API, now readily available, and take a deep dive into AI Image Restoration by comparing Real-ESRGAN and SwinIR. Explore the step-by-step guide on Building a Powerful Recommendation System with Palm 2 Model and Streamlit, followed by unlocking the power of OpenAI's GPT with No-code AI Builder.
Our captivating lineup continues with topics such as Simplify Airflow DAG Creation and Maintenance with Hamilton, mastering the art of Scraping Large Datasets at Scale, and tapping into Data Modeling Success with 3 Must-Have Contextual Tables.
Prepare yourself for a transformative and enriching journey of knowledge and skill-building.
What are your thoughts on this week’s newsletter? We would appreciate it if you could take a moment to participate in a brief survey below. As a token of our gratitude, you will receive a complimentary PDF copy of the "The Applied Artificial Intelligence Workshop" eBook upon completion. Let's make the DataPro Newsletter even better together!
Share your Feedback!
Cheers,
Merlyn Shelley
Editor-in-Chief, Packt
Whether you're looking to break into a new field or upskill to access better opportunities, the Packt library can help. With thousands of titles (and dozens more added every month), you can explore whatever tickles your fancy.
Visit our platform, browse, and watch out this space for next announcement to get access to our full catalogue for free.
Browse the Library
diffusion-human-feedback - This codebase is a modification of openai/guided-diffusion for implementing censored sampling using human feedback.
clip-as-service - CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions.
semantic-kernel - Semantic Kernel (SK) is a lightweight SDK enabling integration of AI Large Language Models (LLMs) with conventional programming languages.
SuperAGI - A dev-first open source autonomous AI agent framework. Enabling developers to build, manage & run useful autonomous agents quickly and reliably.
langchain-serve - Jina is an open-source framework for scalable multi-modal AI applications, while LangChain is an open-source framework for LLM-powered apps. Deploy LangChain apps on Jina AI Cloud quickly with langchain-serve.
Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications: AWS ISV [independent software vendor] partners have integrated their SaaS platforms with SageMaker, enabling users to utilize its training, deployment, and model registry features. This post explores the benefits, integrations, and development process, along with common architectures and AWS resources. It aims to accelerate time-to-market for ISV partners and inspire SaaS providers and customers to collaborate on these integrations. The integration process is divided into four stages: data access, model training, model deployment and artifacts, and model inference.
Implement Oracle GoldenGate bidirectional replication between Amazon RDS for Oracle databases: This post explores Oracle GoldenGate (OGG) bidirectional replication between Amazon RDS for Oracle instances, enabling highly available and resilient mission-critical applications across Regions. Active-active replication allows independent read/write operations with synchronized changes, supporting global deployment and achieving zero downtime and data loss. Using OGG, customers can leverage existing licenses and expertise to enhance their AWS database deployments.
Cloud SQL for PostgreSQL - A deep dive into VACUUM FAQs: This blog post focuses on explaining the internal functions of VACUUM in PostgreSQL, highlighting its importance in reclaiming space occupied by dead tuples, managing disk space, and maintaining database performance. It covers transaction ID freezing, autovacuum processes, manual execution, parallelism, and performance optimization flags.
Pic2Word: Mapping pictures to words for zero-shot composed image retrieval: The article introduces "Pic2Word," a method for zero-shot composed image retrieval (ZS-CIR). It proposes using image-caption pairs and unlabeled images to train a retrieval model, eliminating the need for costly labeled triplet data. By leveraging the language capabilities of the CLIP [contrastive language-image pre-trained model] model, images are converted into word tokens for flexible composition with text descriptions. The effectiveness of the trained model is validated through experiments on various CIR tasks.
The operation and the details of SAX are fully described in a research paper titled Experiencing SAX: a novel symbolic representation of time series, was officially published back in 2007.
We will begin by explaining the terms PAA and SAX. PAA stands for Piecewise Aggregate Approximation. The PAA representation offers a way to reduce the dimensionality of a time series. PAA is also explained in the Experiencing SAX: a novel symbolic representation of time series paper.
From that, we can easily understand that PAA and SAX are closely related, as the idea behind SAX is based on PAA. The SAX representation is a symbolic representation of time series. Put simply, it offers a way of representing a time series in a summary form, in order to save space and increase speed.
Normalization
Normalization is the process of adjusting values that use different scales to a common scale. Although various types of normalization exist, what is needed here is standard score normalization, which is the simplest form of normalization, because this is what is used for time series and subsequences. The following function shows how to normalize a time series with some help from the NumPy Python package:
def normalize(x):
eps = 1e-6
mu = np.mean(x)
std = np.std(x)
if std < eps:
return np.zeros(shape=x.shape)
else:
return (x-mu)/std
This is seen in the return value of the previous function, (x-mu)/std. NumPy is clever enough to calculate that value for each observation without the need to use a for loop. If the standard deviation is close to 0, which is simulated by the value of the eps variable, then the return value of normalize() is equal to a NumPy array full of zeros.
The normalize.py script, which uses the previously developed function that does not appear here, gets a time series as input and returns its normalized version. Its code is as follows:
#!/usr/bin/env python3
import sys
import pandas as pd
import numpy as np
def main():
if len(sys.argv) != 2:
print("TS")
sys.exit()
F = sys.argv[1]
ts = pd.read_csv(F, compression='gzip', header = None)
ta = ts.to_numpy()
ta = ta.reshape(len(ta))
taNorm = normalize(ta)
print("[", end = ' ')
for i in taNorm.tolist():
print("%.4f" % i, end = ' ')
print("]")
if __name__ == '__main__':
main()
The last for loop of the program is used to print the contents of the taNorm NumPy array with a smaller precision in order to take up less space. To do that, we need to convert the taNorm NumPy array into a regular Python list using the tolist() method.
We are going to feed normalize.py a short time series; however, the script also works with longer ones.
This excerpt is taken from the recently published book titled "Time Series Indexing | Packt (packtpub.com)," written by Mihalis Tsoukalos and published in June 2023. To get a preview of the book's content, be sure to read the whole chapter available here or sign up for a 7-day free trial to access the complete Packt digital library. To explore more, click on the button below.
Discover Fresh Concepts, Keep Reading!
GPT-4 API general availability and deprecation of older models in the Completions API: Starting from July 6th, 2023, GPT-4 is accessible to paying API customers. OpenAI recommends transitioning to the Chat Completions API and plans to deprecate older models. Existing API developers with successful payments can access GPT-4, and access for new developers will be granted later this month. GPT-3.5 Turbo, DALL·E, and Whisper APIs are now generally available, and support for fine-tuning GPT-4 and GPT-3.5 Turbo is expected. Users of the Edits API are advised to migrate to GPT-3.5 Turbo by January 4, 2024.
Comparing Real-ESRGAN and SwinIR: A Deep Dive into AI Image Restoration: This article explores the Real-ESRGAN and SwinIR models for AI-powered image restoration. Real-ESRGAN specializes in super-resolving low-resolution images, while SwinIR employs the Swin Transformer architecture for various image restoration tasks. AIModels.fyi is a valuable resource for discovering and comparing AI models. The guide highlights the strengths, differences, and ideal use cases of the models and explains how AIModels.fyi can be used to expand the toolkit for image enhancement and restoration.
Unlock the power of OpenAI's GPT with No-code AI Builder: Microsoft's AI Builder now supports prebuilt Azure OpenAI models, enabling users to integrate powerful AI capabilities into their applications without coding or data science expertise. By leveraging these models, users can add advanced features like natural language processing and text generation to enhance efficiency, decision-making, and user experiences. The integration of Power Virtual Agents with AI Builder and Azure OpenAI's GPT models enhances chatbot functionality, providing more accurate and context-aware responses for improved customer satisfaction and operational efficiency.
Building a Powerful Recommendation System with Palm 2 Model and Streamlit: A Step-by-Step Guide: This article discusses the PaLM API, based on Google's PaLM 2 model, which excels in various capabilities like text and chat generation. It provides details on different variations of PaLM 2 and how to choose the appropriate one for your use case. Additionally, it explores the use of Streamlit, an open-source Python library, for creating and deploying data apps. The author shares their experience in building a real-time movie recommendation system using the Palm2 LLM model, TMDB API, and Streamlit. The article focuses on designing a recommendation system by integrating multiple APIs and components.
How to Scrape Large Datasets at Scale: This article introduces Bright Data's Web Scraper IDE, a tool for scraping datasets at scale. It highlights the benefits of using the IDE, including accessibility, scalability, accuracy, code templates, and Web Unlocker for avoiding captchas and blocking. Bright Data is a proxy network that facilitates turning websites into structured data. The article provides a guide on using the Web Scraper IDE to create custom datasets scripts without the risk of being blocked by bots.
Unlocking Data Modeling Success: 3 Must-Have Contextual Tables: To simplify data modeling for analytics teams, three generic tables are proposed for ingestion into a Data Warehouse: Date Dimension for timeseries reporting, Zip Code Dimension for geospatial reporting, and FX Rates Fact Table for financial analysis. By incorporating publicly available data into the Data Warehouse, teams can benefit from a centralized source of data, ensuring consistency and scalability in reporting. This approach streamlines analytics processes, reduces reporting discrepancies, and facilitates data-driven decision-making.
Simplify Airflow DAG Creation and Maintenance with Hamilton in 8 minutes: This post highlights the benefits of combining two open-source projects, Hamilton and Airflow, for effective data pipeline orchestration. Airflow handles macro-level orchestration, while Hamilton facilitates clean and maintainable data transformations at a micro-level. Hamilton can be seamlessly integrated into an Airflow setup due to its small dependency footprint. Airflow is widely used for various data initiatives such as ETL, ML pipelines, and BI, but users have encountered challenges in authoring and maintaining data pipelines, which Hamilton aims to address.
Domain Adaption: Fine-Tune Pre-Trained NLP Models: This comprehensive guide discusses the process of fine-tuning pre-trained NLP models for domain adaptation. It focuses on using a siamese neural network to capture semantic similarity in specific contexts. The tutorial covers the theoretical framework, data preparation, model evaluation, and fine-tuning process using the Universal Sentence Encoder as an example. The results showcase the effectiveness of fine-tuning for improving similarity scores within a domain. By following this approach, users can maximize the potential of pre-trained NLP models and enhance their natural language processing tasks.
See you next time!