Skip to main content

Unanswered Questions

349 questions with no upvoted or accepted answers
6 votes
3 answers
205 views

What Clustering Method Should I Use?

My data is a group of 10 thousand points (each having an node location (x,y)) that are spread across a plane. They are also chromatically-colored based on their weight. I need to finalize a bayesian ...
6 votes
0 answers
118 views

Fixed-radius range search in non-Euclidean space

I'm trying to find an indexing data structure most suitable for my metric space: set of IP network related data (IP addresses, ports, TCP flags, ...), distance function is continuous, non-Euclidean ...
6 votes
3 answers
354 views

Anomaly detection using clustering of highly correlated Categorical data

My data has two columns and both are highly correlated e.g. if column1 has value ABC, column2 should be XYZ i.e. ABC-->XYZ. If column2 has anything else it's Anomaly. Likewise, there are thousands ...
5 votes
1 answer
346 views

Clustering time series based on monotonic similarity

Context I am involved in the task of clustering 1500 time series of 500 observations into a few clusters. The time series share all the same observed properties at different spatial locations, but ...
3 votes
0 answers
17 views

Clustering metric - why Clustering accuracy (ACC) is not popular as ARI?

I have samples with their's GT clusters. I want to measure the success of different cluster algorithms. It seems, that when having the GT, it is popular to use ARI (adjusted rand Index). I saw there ...
3 votes
0 answers
251 views

Cluster tabular data with text in some columns

Let's say I have a following features in the my dataframe: user_id user_age is_student is_graduate salary resume integer integer binary binary integer text (up to 1000 symbols) And also a few more ...
3 votes
0 answers
169 views

Clustering large set of images

I've got some big datasets of images (a few million each), and I would like to cluster them according to images' visual similarities. I've extracted a feature vector for each image; the space of ...
3 votes
1 answer
926 views

How to compare topics generated from topic modeling from different datasets?

I have two datasets of a similar theme. Let's assume Dataset A and Dataset B. Using the top2vec model (https://github.com/ddangelov/Top2Vec) (https://arxiv.org/abs/2008.09470) on each dataset, I came ...
3 votes
2 answers
1k views

Clustering mixed data types - numeric, categorical, arrays, and text

I have a dataset with 4 types of data columns: ...
3 votes
3 answers
305 views

How to decide who to market? Clustering or Decision Tree?

I am working with a dataset that has enough observations and ~ 10 variables, half of the variables are numeric another half of the variables are categorical with 2-3 levels (demographics) one ID ...
3 votes
1 answer
50 views

Visualizing the difference of a set of strings

I have a distance metric on a collection of strings on the order of tens of thousands. What would be an intuitive way to summarize how 'different' these strings are or when they overlap? My goal is, ...
3 votes
3 answers
672 views

What value can I gain by doing exploratory data analysis on features (and thus data) before doing clustering?

This might not be a very good question, but I would still ask if it's beneficial to do EDA before running a clustering algorithm? I understand that EDA helps us generate good and helpful insights ...
3 votes
2 answers
3k views

Is k-means with Mahalanobis a valid option for clustering?

I want more info into if k-means with Mahalanobis distance is a mathematically/methodologically correct option for datasets with different variance clusters. The steps are: Create aggregate datasets (...
3 votes
1 answer
47 views

Identify members who are likely to switch where they receive drug administration

I have access to medical claim data from a large health insurance company. As some of you may know there is a large delta between the price of drug X depending on where it is administered. My company ...
3 votes
0 answers
206 views

How to remove noise using morphological filtering

I have two groups of dots that both contain noise between them: The line that separates the two groups in the picture is diagonal in shape. I tried to use morphological filtering on this image to ...

15 30 50 per page
1
2 3 4 5
24